I am working through this:
https://medium.com/diogo-menezes-borges/introduction-to-statistics-for-data-science-6c246ed2468d
About 3/4 of the way through there is a histogram, but the author does not supply the code used to generate it.
So I decided to give it a go...
I have everything working, but I would like to add minor ticks to my plot.
X-axis only, spaced 200 units apart (matching the bin width used in my code).
In particular, I would like to add minor ticks in the style from the last example from here:
https://matplotlib.org/3.1.0/gallery/ticks_and_spines/major_minor_demo.html
I have tried several times but I just can't get that exact 'style' to work on my plot.
Here is my working code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
print('NumPy: {}'.format(np.__version__))
print('Pandas: {}'.format(pd.__version__))
print('\033[1;31m' + '--------------' + '\033[0m') # Bold red
display_settings = {
'max_columns': 15,
'max_colwidth': 60,
'expand_frame_repr': False, # Wrap to multiple pages
'max_rows': 50,
'precision': 6,
'show_dimensions': False
}
# pd.options.display.float_format = '{:,.2f}'.format
for op, value in display_settings.items():
pd.set_option("display.{}".format(op), value)
file = "e:\\python\\pandas\\medium\\sets.csv"
lego = pd.read_csv(file, encoding="utf-8")
print(lego.shape, '\n')
print(lego.info(), '\n')
print(lego.head(), '\n')
print(lego.isnull().sum(), '\n')
dfs = [lego]
names = ['lego']
def NaN_percent(_df, column_name):
# empty_values = row_count - _df[column_name].count()
empty_values = _df[column_name].isnull().sum()
return (100.0 * empty_values)/row_count
c = 0
print('Columns with missing values expressed as a percentage.')
for df in dfs:
print('\033[1;31m' + ' ' + names[c] + '\033[0m')
row_count = df.shape[0]
for i in list(df):
x = NaN_percent(df, i)
if x > 0:
print(' ' + i + ': ' + str(x.round(4)) + '%')
c += 1
print()
# What is the average number of parts in the sets of legos?
print(lego['num_parts'].mean(), '\n')
# What is the median number of parts in the sets of legos?
print(lego['num_parts'].median(), '\n')
print(lego['num_parts'].max(), '\n')
# Create Bins for Data Ranges
bins = []
for i in range(lego['num_parts'].min(), 6000, 200):
bins.append(i + 1)
# Use 'right' to determine which bin overlapping values fall into.
cuts = pd.cut(lego['num_parts'], bins=bins, right=False)
# Count values in each bin.
print(cuts.value_counts(), '\n')
plt.hist(lego['num_parts'], color='red', edgecolor='black', bins=bins)
plt.title('Histogram of Number of parts')
plt.xlabel('Bin')
plt.ylabel('Number of values per bin')
plt.axvline(x=162.2624, color='blue')
plt.axvline(x=45.0, color='green', linestyle='--')
# https://matplotlib.org/gallery/text_labels_and_annotations/custom_legends.html
legend_elements = [Line2D([0], [0], color='blue', linewidth=2, linestyle='-'),
Line2D([0], [1], color='green', linewidth=2, linestyle='--')
]
labels = ['mean: 162.2624', 'median: 45.0']
plt.legend(legend_elements, labels)
plt.show()
You can just add:
ax = plt.gca()
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.tick_params(which='minor', length=4, color='r')
See this post to get a better idea about the difference between plt, ax and fig. In broad terms, plt refers to the pyplot library of matplotlib. fig is one "plot" that can consist of one or more subplots. ax refers to one subplot and the x and y-axis defined for them, including the measuring units, tick marks, tick labels etc.. Many function in matplotlib are often called as plt.hist, but in the underlying code they are drawing on the "current axes". These axes can be obtained via plt.gca() or "get current axes". It is not always clear which functions can be called via plt. and which only exist via ax.. Also, sometimes the get slightly different names. You'll need to look in the documentation or search StackOverflow which form is needed in each specific case.
Related
I'm developing a set of graphs to paint some Pandas DataFrame values. For that I'm using various pandas, numpy and matplotlib modules and functions using the following code:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
data = {'Name': ['immoControlCmd', 'BrkTerrMde', 'GlblClkYr', 'HsaStat', 'TesterPhysicalResGWM', 'FapLc','FirstRowBuckleDriver', 'GlblClkDay'],
'Value': [0, 5, 0, 4, 0, 1, 1, 1],
'Id_Par': [0, 0, 3, 3, 3, 3, 0, 0]
}
signals_df = pd.DataFrame(data)
def plot_signals(signals_df):
# Count signals by par
signals_df['Count'] = signals_df.groupby('Id_Par').cumcount().add(1).mask(signals_df['Id_Par'].eq(0), 0)
# Subtract Par values from the index column
signals_df['Sub'] = signals_df.index - signals_df['Count']
id_par_prev = signals_df['Id_Par'].unique()
id_par = np.delete(id_par_prev, 0)
signals_df['Prev'] = [1 if x in id_par else 0 for x in signals_df['Id_Par']]
signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
# signals_df['Finall'] = signals_df['Final'].unique()
# print(signals_df['Finall'])
# Convert and set Subtract to index
signals_df.set_index('Final', inplace=True)
# pos_x = len(signals_df.index.unique()) - 1
# print(pos_x)
# Get individual names and variables for the chart
names_list = [name for name in signals_df['Name'].unique()]
num_names_list = len(names_list)
num_axis_x = len(signals_df["Name"])
# Creation Graphics
fig, ax = plt.subplots(nrows=num_names_list, figsize=(10, 10), sharex=True)
plt.xticks(np.arange(0, num_axis_x), color='SteelBlue', fontweight='bold')
for pos, (a_, name) in enumerate(zip(ax, names_list)):
# Get data
data = signals_df[signals_df["Name"] == name]["Value"]
# Get values axis-x and axis-y
x_ = np.hstack([-1, data.index.values, len(signals_df) - 1])
# print(data.index.values)
y_ = np.hstack([0, data.values, data.iloc[-1]])
# Plotting the data by position
ax[pos].plot(x_, y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)
ax[pos].set_ylabel(name, fontsize=8, fontweight='bold', color='SteelBlue', rotation=30, labelpad=35)
ax[pos].yaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
ax[pos].yaxis.set_tick_params(labelsize=6)
ax[pos].grid(alpha=0.4, color='SteelBlue')
plt.show()
plot_signals(signals_df)
What I want is to remove the points or positions of the x-axis where nothing is painted or they are not marked on the graph, but leave the values and names as in the image at the end; Seen from Pandas it would be the "Final" column that, before painting the subplots, assigned it as an index and it is where some of the values in this column are repeated; would be to remove the values enclosed in the red box from the graph, but leave the values and names as in the image at the end:
Name Value Id_Par Count Sub Prev
Final
0 immoControlCmd 0 0 0 0 0
1 BrkTerrMde 5 0 0 1 0
2 GlblClkYr 0 3 1 1 1
2 HsaStat 4 3 2 1 1
2 TesterPhysicalResGWM 0 3 3 1 1
2 FapLc 1 3 4 1 1
6 FirstRowBuckleDriver 1 0 0 6 0
7 GlblClkDay 1 0 0 7 0
I've been trying to bring the unique values of the last column, which would be the value that the x-axis should be, but since the dataframe is of another size or dimension, I get an error: ValueError: Length of values (5) does not match length of index (8), and then I have to resize my chart, but in this case I don't understand how to do it:
signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
signals_df['Finall'] = signals_df['Final'].unique()
print(signals_df['Finall'])
I've also tried to bring the size of the unique index, previously assigned to apply a subtraction to data.index.values of the variable x_, but it does not bring me what I want because it is gathering all the values and subtracting them in bulk and not separately , as is data.index.values:
signals_df.set_index('Final', inplace=True)
pos_x = len(signals_df.index.unique()) - 1
...
..
.
x_ = np.hstack([-1, data.index.values-pos-x, len(signals_df) - 1])
Is there a Pandas and/or Matplotlib function that allows me? or Could someone give me a suggestion that will help me better understand how to do it? what I expect to achieve would be the plot below:
I really appreciate your help, any comments help.
I've Python version: 3.6.5, Pandas version: 1.1.5 and Matplotlib version: 3.3.2
One possible way to do this is if you make your x-axis values into strings, which means that matplotlib will make a "categorical" plot. See examples of that here.
For your case, because you have subplots which would have different values, and they are not always in the right order, we need to do a bit of trickery first to make sure the ticks appear in the correct order. For that, we can use the approach from this answer, where they plot something that uses all of the x values in the correct order, and then remove it.
To gather all the xtick values together, you can do something like this, where you create a list of the values, reduce it to the unique values using a set, then sort those values, and convert to strings using a list comprehension and str():
# First make a list of all the xticks we want
xvals = [-1,]
for name in names_list:
xvals.append(signals_df[signals_df["Name"] == name]["Value"].index.values[0])
xvals.append(len(signals_df)-1)
# Reduce to only unique values, sorted, and then convert to strings
xvals = [str(i) for i in sorted(set(xvals))]
Once you have those, you can make a dummy plot, and then remove it, like so (this is to fix the tick positions in the correct order). NOTE that this needs to be inside your plotting loop for matplotlib versions 3.3.4 and earlier:
# To get the ticks in the right order on all subplots, we need to make
# a dummy plot here and then remove it
dummy, = ax[0].plot(xvals, np.zeros_like(xvals))
dummy.remove()
Finally, when you actually plot the real data inside the loop, you just need to convert x_ to strings as you plot them:
ax[pos].plot(x_.astype('str'), y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)
Note the only other change I made was to not explicitly set the xtick positions (which you did, with plt.xticks), but you can still use that command to set the font colour and weight
plt.xticks(color='SteelBlue', fontweight='bold')
And this is the output:
For completeness, here I have put it all together in your script:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
import matplotlib
print(matplotlib.__version__)
data = {'Name': ['immoControlCmd', 'BrkTerrMde', 'GlblClkYr', 'HsaStat', 'TesterPhysicalResGWM', 'FapLc',
'FirstRowBuckleDriver', 'GlblClkDay'],
'Value': [0, 5, 0, 4, 0, 1, 1, 1],
'Id_Par': [0, 0, 3, 3, 3, 3, 0, 0]
}
signals_df = pd.DataFrame(data)
def plot_signals(signals_df):
# Count signals by par
signals_df['Count'] = signals_df.groupby('Id_Par').cumcount().add(1).mask(signals_df['Id_Par'].eq(0), 0)
# Subtract Par values from the index column
signals_df['Sub'] = signals_df.index - signals_df['Count']
id_par_prev = signals_df['Id_Par'].unique()
id_par = np.delete(id_par_prev, 0)
signals_df['Prev'] = [1 if x in id_par else 0 for x in signals_df['Id_Par']]
signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
# signals_df['Finall'] = signals_df['Final'].unique()
# print(signals_df['Finall'])
# Convert and set Subtract to index
signals_df.set_index('Final', inplace=True)
# pos_x = len(signals_df.index.unique()) - 1
# print(pos_x)
# Get individual names and variables for the chart
names_list = [name for name in signals_df['Name'].unique()]
num_names_list = len(names_list)
num_axis_x = len(signals_df["Name"])
# Creation Graphics
fig, ax = plt.subplots(nrows=num_names_list, figsize=(10, 10), sharex=True)
# No longer any need to define where the ticks go, but still set the colour and weight here
plt.xticks(color='SteelBlue', fontweight='bold')
# First make a list of all the xticks we want
xvals = [-1, ]
for name in names_list:
xvals.append(signals_df[signals_df["Name"] == name]["Value"].index.values[0])
xvals.append(len(signals_df) - 1)
# Reduce to only unique values, sorted, and then convert to strings
xvals = [str(i) for i in sorted(set(xvals))]
for pos, (a_, name) in enumerate(zip(ax, names_list)):
# To get the ticks in the right order on all subplots,
# we need to make a dummy plot here and then remove it
dummy, = ax[pos].plot(xvals, np.zeros_like(xvals))
dummy.remove()
# Get data
data = signals_df[signals_df["Name"] == name]["Value"]
# Get values axis-x and axis-y
x_ = np.hstack([-1, data.index.values, len(signals_df) - 1])
y_ = np.hstack([0, data.values, data.iloc[-1]])
# Plotting the data by position
# NOTE: here we convert x_ to strings as we plot, to make sure they are plotted as catagorical values
ax[pos].plot(x_.astype('str'), y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)
ax[pos].set_ylabel(name, fontsize=8, fontweight='bold', color='SteelBlue', rotation=30, labelpad=35)
ax[pos].yaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
ax[pos].yaxis.set_tick_params(labelsize=6)
ax[pos].grid(alpha=0.4, color='SteelBlue')
plt.show()
plot_signals(signals_df)
Similar to many other researchers on stackoverflow who are trying to plot a contour graph out of 4D data (i.e., X,Y,Z and their corresponding value C), I am attempting to plot a 4D contour map out of my data. I have tried many of the suggested solutions in stackover flow. From all of the plots suggested this, and this were the closest to what I want but sill not quite what I need in terms of data interpretation. Here is the ideal plot example: (source)
Here is a subset of the data. I put it on the dropbox. Once this data is downloaded to the directory of the python file, the following code will work. I have modified this script from this post.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib.tri as mtri
#####Importing the data
df = pd.read_csv('Data_4D_plot.csv')
do_random_pt_example = False;
index_x = 0; index_y = 1; index_z = 2; index_c = 3;
list_name_variables = ['x', 'y', 'z', 'c'];
name_color_map = 'seismic';
if do_random_pt_example:
number_of_points = 200;
x = np.random.rand(number_of_points);
y = np.random.rand(number_of_points);
z = np.random.rand(number_of_points);
c = np.random.rand(number_of_points);
else:
x = df['X'].to_numpy();
y = df['Y'].to_numpy();
z = df['Z'].to_numpy();
c = df['C'].to_numpy();
#end
#-----
# We create triangles that join 3 pt at a time and where their colors will be
# determined by the values of their 4th dimension. Each triangle contains 3
# indexes corresponding to the line number of the points to be grouped.
# Therefore, different methods can be used to define the value that
# will represent the 3 grouped points and I put some examples.
triangles = mtri.Triangulation(x, y).triangles;
choice_calcuation_colors = 2;
if choice_calcuation_colors == 1: # Mean of the "c" values of the 3 pt of the triangle
colors = np.mean( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
elif choice_calcuation_colors == 2: # Mediane of the "c" values of the 3 pt of the triangle
colors = np.median( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
elif choice_calcuation_colors == 3: # Max of the "c" values of the 3 pt of the triangle
colors = np.max( [c[triangles[:,0]], c[triangles[:,1]], c[triangles[:,2]]], axis = 0);
#end
#----------
###=====adjust this part for the labeling of the graph
list_name_variables[index_x] = 'X (m)'
list_name_variables[index_y] = 'Y (m)'
list_name_variables[index_z] = 'Z (m)'
list_name_variables[index_c] = 'C values'
# Displays the 4D graphic.
fig = plt.figure(figsize = (15,15));
ax = fig.gca(projection='3d');
triang = mtri.Triangulation(x, y, triangles);
surf = ax.plot_trisurf(triang, z, cmap = name_color_map, shade=False, linewidth=0.2);
surf.set_array(colors); surf.autoscale();
#Add a color bar with a title to explain which variable is represented by the color.
cbar = fig.colorbar(surf, shrink=0.5, aspect=5);
cbar.ax.get_yaxis().labelpad = 15; cbar.ax.set_ylabel(list_name_variables[index_c], rotation = 270);
# Add titles to the axes and a title in the figure.
ax.set_xlabel(list_name_variables[index_x]); ax.set_ylabel(list_name_variables[index_y]);
ax.set_zlabel(list_name_variables[index_z]);
ax.view_init(elev=15., azim=45)
plt.show()
Here would be the output:
Although it looks brilliant, it is not quite what I am looking for (the above contour map example). I have modified the following script from this post in the hope to reach the required graph, however, the chart looks nothing similar to what I was expecting (something similar to the previous output graph). Warning: the following code may take some time to run.
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
df = pd.read_csv('Data_4D_plot.csv')
x = df['X'].to_numpy();
y = df['Y'].to_numpy();
z = df['Z'].to_numpy();
cc = df['C'].to_numpy();
# convert to 2d matrices
Z = np.outer(z.T, z)
X, Y = np.meshgrid(x, y)
C = np.outer(cc.T,cc)
# fourth dimention - colormap
# create colormap according to cc-value
color_dimension = C # change to desired fourth dimension
minn, maxx = color_dimension.min(), color_dimension.max()
norm = matplotlib.colors.Normalize(minn, maxx)
m = plt.cm.ScalarMappable(norm=norm, cmap='jet')
m.set_array([])
fcolors = m.to_rgba(color_dimension)
# plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X,Y,Z, rstride=1, cstride=1, facecolors=fcolors, vmin=minn, vmax=maxx, shade=False)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Now I was wondering from our kind community and experts if you can help me to plot a contour figure similar to the example graph (image one in this post), where the contours are based on the values within the range of C?
I am new in coding with python, I am trying to develop a bar chart with percentage on top. I have a sample data frame Quiz2. I developed code and gives only 1600% at first single bar. Kindly any one with help how can i do it correct?
#Approach 2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()
I want the percentage of Kaha (with 21 sum total) to appear as (76.2% for Male and 23.8% for Female) and that of shy (with 26 sum total) as (61.5% for Male and 38.5%for Female). Kindly requesting help
In approach 2, the reason you have only 1 value displaying is the plt.show()
should be outdented so it comes after the processing of the for loop. You are getting a value of 1600% because you are plotting the value as the height of the bar in the line beginning with Q1p.annotate(f'{height:.0%}' Instead of height this should be height/10*total or something to give you the percentage.
Here is a solution, but not sure if I am computing the percentages correctly:
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
total = len(data)*10
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height/total:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()
I have a plot as shown below that has over 1000 x-axis points. I'm trying to scale the x-axis into 3 values, the min, mid & max value instead of having 1000 labels.
Despite my efforts denoted within the hashtags, all 3 values are written onto the same tick (on top of each other) or simply only 1 tick is randomly placed along the x-axis.
import matplotlib.pyplot as plt
from matlplotlib.pyplot import figure
figure (num = None, figsize=(20,10), dpi=80, facecolor='w', edgecolor='k')
ax =plt.gca()
data.plot(kind='bar', x='colA', y='colB', ax=ax)
######
plt.xticks(np.arrange(0,3, step 1)
**ALSO TRIED**
plt.xticks = ([1,2,3], ["a","b","c"])
######
plt.show()
How can I distribute the min,mid and max value evenly across the X-axis?
If 'colA' is numerical:
x_min = min(data['colA'])
x_max = max(data['colA'])
x_mid = (x_min + x_max) / 2
# use regular division if the numbers are floats, use integer division in case all numbers are integers
plt.xticks([x_min, x_mid, x_max], ["a","b","c"])
# plt.xticks([x_min, x_mid, x_max]) # leave out the labels if the default labels are OK
If, on the contrary, 'colA' is categorical (so, some strings), they are numbered internally as 0, 1, 2, ... up till the number of strings minus one:
x_min = 0
x_max = len(data['colA']) - 1
x_mid = x_max // 2 # integer division
plt.xticks([x_min, x_mid, x_max])
You could try:
min_ = min(data['colA'])
max_ = max(data['colA'])
mid_ = (min_ + max_) / 2.
ax.set_xticks([min_, mid_, max_])
ax.set_xticklabels(['min', 'mid', 'max'])
There is an example here for how to create a multi-colored text title.
However, I want to apply this to a plot that already has a figure in it.
For example, if I apply it to this (same code as with the example minus a few extras and with another figure)...:
plt.rcdefaults()
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import transforms
fig = plt.figure(figsize=(4,3), dpi=300)
def rainbow_text(x,y,ls,lc,**kw):
t = plt.gca().transData
fig = plt.gcf()
plt.show()
#horizontal version
for s,c in zip(ls,lc):
text = plt.text(x,y," "+s+" ",color=c, transform=t, **kw)
text.draw(fig.canvas.get_renderer())
ex = text.get_window_extent()
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
plt.figure()
rainbow_text(0.5,0.5,"all unicorns poop rainbows ! ! !".split(),
['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black'],
size=40)
...the result is 2 plots with the title enlarged.
This sort of makes sense to me because I'm using plt. two times.
But how do I integrate it so that it only refers to the first instance of plt. in creating the title?
Also, about this line:
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
I notice it can alter the spacing between words, but when I play with the values of x, results are not predictable (spacing is inconsistent between words).
How can I meaningfully adjust that value?
And finally, where it says "units='dots'", what are the other options? Are 'dots' 1/72nd of an inch (and is that the default for Matplotlib?)?
How can I convert units from dots to inches?
Thanks in advance!
In fact the bounding box of the text comes in units unlike the ones used, for example, in scatterplot. Text is a different kind of object that gets somehow redraw if you resize the window or change the ratio. By having a stabilized window you can ask the coordinates of the bounding box in plot units and build your colored text that way:
a = "all unicorns poop rainbows ! ! !".split()
c = ['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black']
f = plt.figure(figsize=(4,3), dpi=120)
ax = f.add_subplot(111)
r = f.canvas.get_renderer()
space = 0.1
w = 0.5
counter = 0
for i in a:
t = ax.text(w, 1.2, a[counter],color=c[counter],fontsize=12,ha='left')
transf = ax.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
w = w + bb.xmax-bb.xmin + space
counter = counter + 1
plt.ylim(0.5,2.5)
plt.xlim(0.6,1.6)
plt.show()
, which results in:
This, however, is still not ideal since you need to keep controlling the size of your plot axis to obtain the correct spaces between words. This is somewhat arbitrary but if you manage to do your program with such a control it's feasible to use plot units to achieve your intended purpose.
ORIGINAL POST:
plt. is just the call to the library. In truth you are creating an instance of plt.figure in the global scope (so it can be seen in locally in the function). Due to this you are overwriting the figure because you use the same name for the variable (so it's just one single instance in the end). To solve this try controlling the names of your figure instances. For example:
import matplotlib.pyplot as plt
#%matplotlib inline
from matplotlib import transforms
fig = plt.figure(figsize=(4,3), dpi=300)
#plt.show(fig)
def rainbow_text(x,y,ls,lc,**kw):
t = plt.gca().transData
figlocal = plt.gcf()
#horizontal version
for s,c in zip(ls,lc):
text = plt.text(x,y," "+s+" ",color=c, transform=t, **kw)
text.draw(figlocal.canvas.get_renderer())
ex = text.get_window_extent()
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
plt.show(figlocal) #plt.show((figlocal,fig))
#plt.figure()
rainbow_text(0.5,0.5,"all unicorns poop rainbows ! ! !".split(),
['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black'],
size=40,)
I've commented several instructions but notice I give a different name for the figure local to the function (figlocal). Also notice that in my examples of show I control directly which figure should be shown.
As for your other questions notice you can use other units as can be seen in the function documentation:
Return a new transform with an added offset.
args:
trans is any transform
kwargs:
fig is the current figure; it can be None if units are 'dots'
x, y give the offset
units is 'inches', 'points' or 'dots'
EDIT: Apparently there's some kind of problem with the extents of the bounding box for text that does not give the correct width of the word and thus the space between words is not stable. My advise is to use the latex functionality of Matplotlib to write the colors in the same string (so only one call of plt.text). You can do it like this:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('pgf')
from matplotlib import rc
rc('text',usetex=True)
rc('text.latex', preamble=r'\usepackage{color}')
a = "all unicorns poop rainbows ! ! !".split()
c = ['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black']
st = ''
for i in range(len(a)):
st = st + r'\textcolor{'+c[i]+'}{'+a[i]+'}'
plt.text(0.5,0.5,st)
plt.show()
This however is not an ideal solution. The reason is that you need to have Latex installed, including the necessary packages (notice I'm using the color package). Take a look at Yann answer in this question: Partial coloring of text in matplotlib
#armatita: I think your answer actually does what I need. I thought I needed display coordinates instead, but it looks like I can just use axis 1 coordinates, if that's what this is (I'm planning on using multiple axes via subplot2grid). Here's an example:
import matplotlib.pyplot as plt
%matplotlib inline
dpi=300
f_width=4
f_height=3
f = plt.figure(figsize=(f_width,f_height), dpi=dpi)
ax1 = plt.subplot2grid((100,115), (0,0), rowspan=95, colspan=25)
ax2 = plt.subplot2grid((100,115), (0,30), rowspan=95, colspan=20)
ax3 = plt.subplot2grid((100,115), (0,55), rowspan=95, colspan=35)
ax4 = plt.subplot2grid((100,115), (0,95), rowspan=95, colspan=20)
r = f.canvas.get_renderer()
t = ax1.text(.5, 1.1, 'a lot of text here',fontsize=12,ha='left')
space=0.1
w=.5
transf = ax1.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
e = ax1.text(.5+bb.width+space, 1.1, 'text',fontsize=12,ha='left')
print(bb)
plt.show()
I'm not sure what you mean about controlling the axis size, though. Are you referring to using the code in different environments or exporting the image in different sizes? I plan on having the image used in the same environment and in the same size (per instance of using this approach), so I think it will be okay. Does my logic make sense? I have a weak grasp on what's really going on, so I hope so. I would use it with a function (via splitting the text) like you did, but there are cases where I need to split on other characters (i.e. when a word in parentheses should be colored, but not the parentheses). Maybe I can just put a delimiter in there like ','? I think I need a different form of .split() because it didn't work when I tried it.
At any rate, if I can implement this across all of my charts, it will save me countless hours. Thank you so much!
Here is an example where there are 2 plots and 2 instances of using the function for posterity:
import matplotlib.pyplot as plt
%matplotlib inline
dpi=300
f_width=4
f_height=3
f = plt.figure(figsize=(f_width,f_height), dpi=dpi)
ax1 = plt.subplot2grid((100,60), (0,0), rowspan=95, colspan=30)
ax2 = plt.subplot2grid((100,60), (0,30), rowspan=95, colspan=30)
f=f #Name for figure
string = str("Group 1 ,vs. ,Group 2 (,sub1,) and (,sub2,)").split(',')
color = ['black','red','black','green','black','blue','black']
xpos = .5
ypos = 1.2
axis=ax1
#No need to include space if incuded between delimiters above
#space = 0.1
def colortext(f,string,color,xpos,ypos,axis):
#f=figure object name (i.e. fig, f, figure)
r = f.canvas.get_renderer()
counter = 0
for i in string:
t = axis.text(xpos, ypos, string[counter],color=color[counter],fontsize=12,ha='left')
transf = axis.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
xpos = xpos + bb.xmax-bb.xmin
counter = counter + 1
colortext(f,string,color,xpos,ypos,axis)
string2 = str("Group 1 part 2 ,vs. ,Group 2 (,sub1,) and (,sub2,)").split(',')
ypos2=1.1
colortext(f,string2,color,xpos,ypos2,axis)
plt.show()