How does parameters 'c' and 'cmap' behave in a matplotlib scatter plot? - python-3.x

For the pyplot.scatter(x,y,s,c....) function ,
The matplotlib docs states that :
c : color, sequence, or sequence of color, optional, default: 'b' The
marker color. Possible values:
A single color format string. A sequence of color specifications of
length n. A sequence of n numbers to be mapped to colors using cmap
and norm. A 2-D array in which the rows are RGB or RGBA. Note that c
should not be a single numeric RGB or RGBA sequence because that is
indistinguishable from an array of values to be colormapped. If you
want to specify the same RGB or RGBA value for all points, use a 2-D
array with a single row.
However i do not understand how i can change the colors of the datapoints as i wish .
I have this piece of code :
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (13.0, 9.0)
# Generate a dataset and plot it
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.55)
print(y)
plt.scatter(X[:,0], X[:,1], c=y)#, cmap=plt.cm.Spectral)
the output plot
How can i change the colours to suppose black and green datapoints if i wish ? or something else ? Also please explain what exactly cmap does .
Why my plots are magenta and blue every time i use plt.cm.Spectral ?

There are essentially two option on how to colorize scatter points.
1. External mapping
You may externally map values to color and supply a list/array of those colors to the scatter's c argument.
z = np.array([1,0,1,0,1])
colors = np.array(["black", "green"])
plt.scatter(x,y, c=colors[z])
2. Internal mapping
Apart from explicit colors, one can also supply a list/array of values which should be mapped to colors according to a normalization and a colormap.
A colormap is a callable that takes float values between 0. and 1. as input and returns a RGB color.
A normalization is a callable that takes any number as input and outputs another number, based on some previously set limits. The usual case of Normalize would provide a linear mapping of values between vmin and vmax to the range between 0. and 1..
The natural way to obtain a color from some data is hence to chain the two,
cmap = plt.cm.Spectral
norm = plt.Normalize(vmin=4, vmax=5)
z = np.array([4,4,5,4,5])
plt.scatter(x,y, c = cmap(norm(z)))
Here the value of 4 would be mapped to 0 by the normalzation, and the value of 5 be mapped to 1, such that the colormap provides the two outmost colors.
This process happens internally in scatter if an array of numeric values is provided to c.
A scatter creates a PathCollection, which subclasses ScalarMappable. A ScalarMappable consists of a colormap, a normalization and an array of values. Hence the above is internalized via
plt.scatter(x,y, c=z, norm=norm, cmap=cmap)
If the minimum and maximum data are to be used as limits for the normalization, you may leave that argument out.
plt.scatter(x,y, c=z, cmap=cmap)
This is the reason that the output in the question will always be purple and yellow dots, independent of the values provided to c.
Coming back to the requirement of mapping an array of 0 and 1 to black and green color you may now look at the colormaps provided by matplotlib and look for a colormap which comprises black and green. E.g. the nipy_spectral colormap
Here black is at the start of the colormap and green somewhere in the middle, say at 0.5. One would hence need to set vmin to 0, and vmax, such that vmax*0.5 = 1 (with 1 the value to be mapped to green), i.e. vmax = 1./0.5 == 2.
import matplotlib.pyplot as plt
import numpy as np
x,y = np.random.rand(2,6)
z = np.array([0,0,1,1,0,1])
plt.scatter(x,y, c = z,
norm = plt.Normalize(vmin=0, vmax=2),
cmap = "nipy_spectral")
plt.show()
Since there may not always be a colormap with the desired colors available and since it may not be straight forward to obtain the color positions from existing colormaps, an alternative is to create a new colormaps specifically for the desired purpose.
Here we might simply create a colormap of two colors black and green.
matplotlib.colors.ListedColormap(["black", "green"])
We would not need any normalization here, because we only have two values and can hence rely on automatic normalization.
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np
x,y = np.random.rand(2,6)
z = np.array([0,0,1,1,0,1])
plt.scatter(x,y, c = z, cmap = mcolors.ListedColormap(["black", "green"]))
plt.show()

First, to set the colors according to the values in y, you can do this:
color = ['red' if i==0 else 'green' for i in y]
plt.scatter(X[:,0], X[:,1], c=color)
Now talking about scatter() and cmap.
ColorMaps are used to provide colors from float values. See this documentation for reference on colormaps.
For values between 0 to 1, a color is chosen from these colormaps.
For example:
plt.cm.Spectral(0.0)
# (0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0) #<== magenta
plt.cm.Spectral(1.0)
# (0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0) #<== blue
plt.cm.Spectral(1)
# (0.6280661284121491, 0.013302575932333718, 0.26082276047673975, 1.0)
Note that the results of 1.0 and 1 are different in above code, because the int and floats are handled differently as mentioned in documentation of __call__() here:
For floats, X should be in the interval [0.0, 1.0] to return the
RGBA values X*100 percent along the Colormap line.
For integers, X should be in the interval [0, Colormap.N) to
return RGBA values indexed from the Colormap with index X.
Please look at this answer for more better explanation about colormaps:-
https://stackoverflow.com/a/25408562/3374996
In your y, you have 0 and 1, so the RGBA values shown in above code are used (which are representing two ends of the Spectral colormap).
Now here's how c and cmap parameters in plt.scatter() interact with each other.
_______________________________________________________________________
|No | type of x, y | c type | values in c | result |
|___|______________|__________|_____________|___________________________|
|1 | single | scalar | numbers | cmap(0.0), no matter |
| | point | | | what the value in c |
|___|______________|__________|_____________|___________________________|
|2 | array of | array | numbers | normalize the values in c,|
| | points | | | cmap(normalized val in c) |
|___|______________|__________|_____________|___________________________|
|3 | scalar or | scalar or| RGBA Values,| no use of cmap, |
| | array | array |Color Strings| use colors from c |
|___|______________|__________|_____________|___________________________|
Now once the actual colors are finalized, then cycles through the colors for each point in x, y. If the size of x, y is equal to or less than size of colors in c, then you get perfect mapping, or else olders colors are used again.
Here's an example to illustrate this:
# Case 1 from above table
# All three points get the same color = plt.cm.Spectral(0)
plt.scatter(x=0.0, y=0.2, c=0, cmap=plt.cm.Spectral)
plt.scatter(x=0.0, y=0.3, c=1, cmap=plt.cm.Spectral)
plt.scatter(x=0.0, y=0.4, c=1.0, cmap=plt.cm.Spectral)
# Case 2 from above table
# The values in c are normalized
# highest value in c gets plt.cm.Spectral(1.0)
# lowest value in c gets plt.cm.Spectral(0.0)
# Others in between as per normalizing
# Size of arrays in x, y, and c must match here, else error is thrown
plt.scatter([0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.3, 0.4, 0.5, 0.6],
c=[1, 2, 3, 4, 5], cmap=plt.cm.Spectral)
# Case 3 from above table => No use of cmap here,
# blue is assigned to the point
plt.scatter(x=0.2, y=0.3, c='b')
# You can also provide rgba tuple
plt.scatter(x=0.2, y=0.4, c=plt.cm.Spectral(0.0))
# Since a single point is present, the first color (green) is given
plt.scatter(x=0.2, y=0.5, c=['g', 'r'])
# Same color 'cyan' is assigned to all values
plt.scatter([0.3, 0.3, 0.3, 0.3, 0.3], [0.2, 0.3, 0.4, 0.5, 0.6],
c='c')
# Colors are cycled through points
# 4th point will get again first color
plt.scatter([0.4, 0.4, 0.4, 0.4, 0.4], [0.2, 0.3, 0.4, 0.5, 0.6],
c=['m', 'y', 'k'])
# Same way for rgba values
# Third point will get first color again
plt.scatter([0.5, 0.5, 0.5, 0.5, 0.5], [0.2, 0.3, 0.4, 0.5, 0.6],
c=[plt.cm.Spectral(0.0), plt.cm.Spectral(1.0)])
Output:
Go through the comments in the code and location of points along with the colors to understand thoroughly.
You can also replace the param c with color in the code of Case 3 and the results will still be same.

Related

format text cell for empty values and color cell for all negative values

I am generating a plotly heatmap as follow:
#app.callback(
Output('graph', 'figure'),
[
Input('submit', 'n_clicks')
],
prevent_initial_call=True
)
def update_plot(n_clicks):
if n_clicks:
my_row = ['T-3', 'T-2', 'T-1']
col = ['T-2', 'T-1', 'T0']
df_stat = pd.DataFrame([[12, -3.5, 7.8], [np.nan, 0.5, -19], [np.nan, np.nan, 56]], columns=col)
df_stat.index = my_row
fig = go.Figure()
fig.add_trace(go.Heatmap(
x=df_stat.columns,
y=df_stat.index,
z=df_stat.values.tolist(),
# zauto=True,
zmax=0.67,
zmin=0,
hoverongaps=False,
showscale=True,
colorscale='OrRd',
text=df_stat.to_numpy(),
texttemplate="%{text}",
hovertemplate='My number: %{z:.2f}<extra></extra>',
textfont={"color": "black"},
# autocolorscale=True
))
fig.update_yaxes(autorange="reversed", type='category', categoryorder='array', categoryarray=my_row)
fig.update_xaxes(automargin=True, side='top', type='category', categoryorder='array', categoryarray=col)
fig.update_layout(height=600, width=1200)
return fig
The dataframe used as input is triangular (made of np.nan and floats). What I am trying to archieve is the following:
for the lower triangular part the text should be "" and not nan or null. The background color for these "empty" cells should be transparent.
for the colorscale, would it be possible to have it applied to only to positive values? Hence having "OrRd" colorscale for the positive values and light grey for any negative value? I tried setting zmin/zmax but then the negative values get the colorscale minimal color. I would be looking for a the colorscale "OrRd" with minimal value set to light grey. Having autocolorscale set to True and zauto to True seem to disregard the chosen colorscale.
for the positive values, is possible to use the autocolorscale (ie set to True) while keeping the colorscale to "OrRd"? I played around the zauto/zmin-zmax and autocolorscale but couldn't get the desired colorscale.
This is how it currently looks:
I am looking at something similar to this:
There isn't any way to use the default parameters in plotly to achieve what you want, but we can get your desired result with a few workarounds.
To deal with NaNs in df_stat displaying as null, you can use the .fillna("") method with an empty string. The background for these cells is already transparent but the background color of the plot is showing through. Since your desired plot appears to have a white background with grey lines, we can take a shortcut and set the template to 'plotly_white'.
I don't think colorscales with multiple conditions (e.g. one color scale for x>=0 and a single color for x<0) exists, so we'll need to mask your df_stat DataFrame and plot these as separate traces.
To achieve this, we'll make two separate masks of df_stat:
df_stat_non_negative will contain the positive values of df_stat with all other values set to NaN. also I am not sure the 'OrRd' colorscale is actually what you want since it looks like you want small positive values to be grey. What you can do specify multiple colors at the normalized values between 0 and 1. For example: if you set colorscale=[[0, 'lightgrey'],[0.10, 'LightSalmon'],[1.0,'DarkRed']], you'll get lightgrey blending into lightsalmon for relatively small values in your heatmap, but the majority of the heatmap will resemble 'OrRd'
df_stat_negative will contain the negative values of df_stat with all other values set to NaN and for the heatmap, we'll create a custom colorscale where normalized values from 0 to 1 are both set to grey (this will ensure the heatmap corresponding to df_stat_negative has every cell colored grey regardless of the value). We'll also hide the colorscale for the heatmap of negative values.
Putting this all together:
from turtle import bgcolor
import numpy as np
import pandas as pd
import plotly.graph_objects as go
my_row = ['T-3', 'T-2', 'T-1']
col = ['T-2', 'T-1', 'T0']
df_stat = pd.DataFrame([[12, -3.5, 7.8], [np.nan, 0.5, -19], [np.nan, np.nan, 56]], columns=col)
df_stat.index = my_row
## only plot positive df_stat values
df_stat_non_negative = df_stat.copy()
df_stat_non_negative[df_stat_non_negative < 0] = np.nan
df_stat_non_negative.fillna("", inplace=True)
fig = go.Figure()
fig.add_trace(go.Heatmap(
x=df_stat_non_negative.columns,
y=df_stat_non_negative.index,
z=df_stat_non_negative.values.tolist(),
# zauto=True,
zmin=0,
hoverongaps=False,
showscale=True,
colorscale=[[0, 'lightgrey'],[0.10, 'LightSalmon'],[1.0,'DarkRed']],
text=df_stat_non_negative.to_numpy(),
texttemplate="%{text}",
hovertemplate='My number: %{z:.2f}<extra></extra>',
textfont={"color": "black"},
# autocolorscale=True
))
df_stat_negative = df_stat.copy()
df_stat_negative[df_stat_negative >= 0] = np.nan
df_stat_negative.fillna("", inplace=True)
fig.add_trace(go.Heatmap(
x=df_stat_negative.columns,
y=df_stat_negative.index,
z=df_stat_negative.values.tolist(),
# zauto=True,
zmax=0,
zmin=0,
hoverongaps=False,
showscale=False,
colorscale=[[0, 'lightgrey'],[1.0, 'lightgrey']],
text=df_stat_negative.to_numpy(),
texttemplate="%{text}",
hovertemplate='My number: %{z:.2f}<extra></extra>',
textfont={"color": "black"},
# autocolorscale=True
))
fig.update_yaxes(autorange="reversed", type='category', categoryorder='array', categoryarray=my_row)
fig.update_xaxes(automargin=True, side='top', type='category', categoryorder='array', categoryarray=col)
fig.update_layout(height=600, width=1200, template='plotly_white')
fig.show()

Misunderstanding in a Matplotlib program

I was working on the code "Discrete distribution as horizontal bar chart", found here LINK, using Matplotlib 3.1.1
I've been circling around the question for a while, but I still can't figure it out: what's the meaning of the instruction: category_colors = plt.get_cmap('RdYlGn')(np.linspace(0.15, 0.85, data.shape[1])) ?
As np.linspace(0.15, 0.85, data.shape[1]) resolves to array([0.15 , 0.325, 0.5 , 0.675, 0.85 ]), I first thought that the program was using the colormap RdYlGn (supposed to go from color=0.0 to color=1.0) and was then taking the 5 specific colors located at point 0.15, etc., 0.85
But, printing category_colors resolves to a (5, 4) array:
array([[0.89888504, 0.30549789, 0.20676663, 1. ],
[0.99315648, 0.73233372, 0.42237601, 1. ],
[0.99707805, 0.9987697 , 0.74502115, 1. ],
[0.70196078, 0.87297193, 0.44867359, 1. ],
[0.24805844, 0.66720492, 0.3502499 , 1. ]])
I don't understand what these numbers refer to ???
plt.get_cmap('RdYlGn') returns a function which maps a number between 0 and 1 to a corresponding color, where 0 gets mapped to red, 0.5 to yellow and 1 to green. Often, this function gets the name cmap = plt.get_cmap('RdYlGn'). Then cmap(0) (which is the same as plt.get_cmap('RdYlGn')(0)) would be the rbga-value (0.6470588235294118, 0.0, 0.14901960784313725, 1.0) for (red, green, blue, alpha). In hexadecimal, this color would be #a50026.
By numpy's broadcasting magic, cmap(np.array([0.15 , 0.325, 0.5 , 0.675, 0.85 ])) gets the same result as np.array([cmap(0.15), cmap(0.325), ..., cmap(0.85)]). (In other words, many numpy functions applied to an array return an array of that function applied to the individual elements.)
So, the first row of category_colors = cmap(np.linspace(0.15, 0.85, 5)) will be the rgba-values of the color corresponding to value 0.15, or 0.89888504, 0.30549789, 0.20676663, 1.. This is a color with 90% red, 31% green and 21% blue (and alpha=1 for complete opaque), so quite reddish. The next row are the rgba values corresponding to 0.325, and so on.
Here is some code to illustrate the concepts:
import matplotlib.pyplot as plt
from matplotlib.colors import to_hex # convert a color to hexadecimal format
from matplotlib.cm import ScalarMappable # needed to create a custom colorbar
import numpy as np
cmap = plt.get_cmap('RdYlGn')
color_values = np.linspace(0.15, 0.85, 5)
category_colors = cmap(color_values)
plt.barh(color_values, 1, height=0.15, color=category_colors)
plt.yticks(color_values)
plt.colorbar(ScalarMappable(cmap=cmap), ticks=color_values)
plt.ylim(0, 1)
plt.xlim(0, 1.1)
plt.xticks([])
for val, color in zip(color_values, category_colors):
r, g, b, a = color
plt.text(0.1, val, f'r:{r:0.2f} g:{g:0.2f} b:{b:0.2f} a:{a:0.1f}\nhex:{to_hex(color)}', va='center')
plt.show()
PS: You might also want to read about norms, which map an arbitrary range to the range 0,1 to be used by colormaps.

Plot list as a colorbar to efficiently visualize outliers

I'd like to generate a colorbar for values stored in a list
def map_values_to_color(data: List, cmap: str, integer=False):
norm = matplotlib.colors.Normalize(vmin=min(data), vmax=max(data), clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cmap)
if integer:
color = [[r, g, b] for r, g, b, a in mapper.to_rgba(data, bytes=True)]
else:
color = [[r, g, b] for r, g, b, a in mapper.to_rgba(data)]
colorlist = [(val, color) for val, color in zip(data, color)]
return colorlist
if __name__ == '__main__':
vals = [100, .80, .10, .79, .70, .60, .75, .78, .65, .90]
colorlist = map_values_to_color(data=vals, cmap='bwr_r', integer=True)
Any suggestions on how to generate just the colorbar will be really helpful.
EDIT:
Output obtained from the below code:
EDIT2:
The below answer might be useful for lists without outliers. However, my data has outliers and I am still looking for suggestions/ inputs on how to visualize the data with outliers efficiently using a colorbar i.e some sort of a discrete colorbar.
As they told you, you need a 2-d array to use imshow, but you need a 1-row, N-columns array to represent the inherently mono-dimensionality of a list.
Further, we can apply a little bit of cosmetics to the ticks to simplify the plot (I removed the y ticks because you do not really have an y axis) and to make easier to identify the outliers (I specified a denser set of x ticks — beware that for a really long list this must be adapted in some way).
Last but not least, you have a strict definition of outliers and want a colormap describing in detail the correct range and evidencing the outliers, with this regard I have adapted an answer by Joe Kington, in which we modify the colormap to show contrasting colors for the outliers and specify, at the level of imshow, which is the range outside of which we have outliers.
Here it is my script (note the extended slicing syntax that makes a 1-d array a 2-d one, the use of vmin and vmax, the use of `extend='both' and how we set the contrasting colors for the outliers)
import numpy as np
import matplotlib.pyplot as plt
vals = [100, .80, .10, .79, .70, -80, .75, .78, .65, .90]
arr = np.array(vals)[None, :] # arr.shape ⇒ (1, 10)
### #######################
im = plt.imshow(arr, vmin=0, vmax=1, cmap='bwr')
cbar = plt.colorbar(im, extend='both', orientation='horizontal')
cbar.cmap.set_under('yellow') , cbar.cmap.set_over('green')
plt.xticks(range(len(vals))), plt.yticks(())
plt.show
The script produces

Threshold using OpenCv?

As the questions states, I want to apply a two-way Adaptive Thresholding technique to my image. That is to say, I want to find each pixel value in the neighborhood and set it to 255 if it is less than or greater than the mean of the neighborhood minus a constant c.
Take this image, for example, as the neighborhood of pixels. The desired pixel areas to keep are the darker areas on the third and sixth squares' upper-half (from left-to-right and top-to-bottom), as well as the eight and twelve squares' upper-half.
Obviously, this all depends on the set constant value, but ideally areas that are significantly different than the mean pixel value of the neighborhood will be kept. I can worry about the tuning myself though.
Your question and comment are contradictory: Keep everything (significantly) brighter/darker than the mean (+/- constant) of the neighbourhood (question) vs. keep everything within mean +/- constant (comment). I assume the first one to be the correct, and I'll try to give an answer.
Using cv2.adaptiveThreshold is certainly useful; parameterization might be tricky, especially given the example image. First, let's have a look at the output:
We see, that the intensity value range in the given image is small. The upper-halfs of the third and sixth' squares don't really differ from their neighbourhood. It's quite unlikely to find a proper difference there. The upper-halfs of squares #8 and #12 (or also the lower-half of square #10) are more likely to be found.
Top row now shows some more "global" parameters (blocksize = 151, c = 25), bottom row more "local" parameters (blocksize = 51, c = 5). Middle column is everything darker than the neighbourhood (with respect to the paramters), right column is everything brighter than the neighbourhood. We see, in the more "global" case, we get the proper upper-halfs, but there are mostly no "significant" darker areas. Looking, at the more "local" case, we see some darker areas, but we won't find the complete upper-/lower-halfs in question. That's just because how the different triangles are arranged.
On the technical side: You need two calls of cv2.adaptiveThreshold, one using the cv2.THRESH_BINARY_INV mode to find everything darker and one using the cv2.THRESH_BINARY mode to find everything brighter. Also, you have to provide c or -c for the two different cases.
Here's the full code:
import cv2
from matplotlib import pyplot as plt
from skimage import io # Only needed for web grabbing images
plt.figure(1, figsize=(15, 10))
img = cv2.cvtColor(io.imread('https://i.stack.imgur.com/dA1Vt.png'), cv2.COLOR_RGB2GRAY)
plt.subplot(2, 3, 1), plt.imshow(img, cmap='gray'), plt.colorbar()
# More "global" parameters
bs = 151
c = 25
img_le = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, bs, c)
img_gt = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, bs, -c)
plt.subplot(2, 3, 2), plt.imshow(img_le, cmap='gray')
plt.subplot(2, 3, 3), plt.imshow(img_gt, cmap='gray')
# More "local" parameters
bs = 51
c = 5
img_le = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, bs, c)
img_gt = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, bs, -c)
plt.subplot(2, 3, 5), plt.imshow(img_le, cmap='gray')
plt.subplot(2, 3, 6), plt.imshow(img_gt, cmap='gray')
plt.tight_layout()
plt.show()
Hope that helps – somehow!
-----------------------
System information
-----------------------
Python: 3.8.1
Matplotlib: 3.2.0rc1
OpenCV: 4.1.2
-----------------------
Another way to look at this is that where abs(mean - image) <= c, you want that to become white, otherwise you want that to become black. In Python/OpenCV/Scipy/Numpy, I first compute the local uniform mean (average) using a uniform 51x51 pixel block averaging filter (boxcar average). You could use some weighted averaging method such as the Gaussian average, if you want. Then I compute the abs(mean - image). Then I use Numpy thresholding. Note: You could also just use one simple threshold (cv2.threshold) on the abs(mean-image) result in place of two numpy thresholds.
Input:
import cv2
import numpy as np
from scipy import ndimage
# read image as grayscale
# convert to floats in the range 0 to 1 so that the difference keeps negative values
img = cv2.imread('squares.png',0).astype(np.float32)/255.0
# get uniform (51x51 block) average
ave = ndimage.uniform_filter(img, size=51)
# get abs difference between ave and img and convert back to integers in the range 0 to 255
diff = 255*np.abs(ave - img)
diff = diff.astype(np.uint8)
# threshold
# Note: could also just use one simple cv2.Threshold on diff
c = 5
diff_thresh = diff.copy()
diff_thresh[ diff_thresh <= c ] = 255
diff_thresh[ diff_thresh != 255 ] = 0
# view result
cv2.imshow("img", img)
cv2.imshow("ave", ave)
cv2.imshow("diff", diff)
cv2.imshow("threshold", diff_thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save result
cv2.imwrite("squares_2way_thresh.jpg", diff_thresh)
Result:

Using matplotlib to represent three variables in two dimensions with colors [duplicate]

I want to make a scatterplot (using matplotlib) where the points are shaded according to a third variable. I've got very close with this:
plt.scatter(w, M, c=p, marker='s')
where w and M are the data points and p is the variable I want to shade with respect to.
However I want to do it in greyscale rather than colour. Can anyone help?
There's no need to manually set the colors. Instead, specify a grayscale colormap...
import numpy as np
import matplotlib.pyplot as plt
# Generate data...
x = np.random.random(10)
y = np.random.random(10)
# Plot...
plt.scatter(x, y, c=y, s=500) # s is a size of marker
plt.gray()
plt.show()
Or, if you'd prefer a wider range of colormaps, you can also specify the cmap kwarg to scatter. To use the reversed version of any of these, just specify the "_r" version of any of them. E.g. gray_r instead of gray. There are several different grayscale colormaps pre-made (e.g. gray, gist_yarg, binary, etc).
import matplotlib.pyplot as plt
import numpy as np
# Generate data...
x = np.random.random(10)
y = np.random.random(10)
plt.scatter(x, y, c=y, s=500, cmap='gray')
plt.show()
In matplotlib grey colors can be given as a string of a numerical value between 0-1.
For example c = '0.1'
Then you can convert your third variable in a value inside this range and to use it to color your points.
In the following example I used the y position of the point as the value that determines the color:
from matplotlib import pyplot as plt
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [125, 32, 54, 253, 67, 87, 233, 56, 67]
color = [str(item/255.) for item in y]
plt.scatter(x, y, s=500, c=color)
plt.show()
Sometimes you may need to plot color precisely based on the x-value case. For example, you may have a dataframe with 3 types of variables and some data points. And you want to do following,
Plot points corresponding to Physical variable 'A' in RED.
Plot points corresponding to Physical variable 'B' in BLUE.
Plot points corresponding to Physical variable 'C' in GREEN.
In this case, you may have to write to short function to map the x-values to corresponding color names as a list and then pass on that list to the plt.scatter command.
x=['A','B','B','C','A','B']
y=[15,30,25,18,22,13]
# Function to map the colors as a list from the input list of x variables
def pltcolor(lst):
cols=[]
for l in lst:
if l=='A':
cols.append('red')
elif l=='B':
cols.append('blue')
else:
cols.append('green')
return cols
# Create the colors list using the function above
cols=pltcolor(x)
plt.scatter(x=x,y=y,s=500,c=cols) #Pass on the list created by the function here
plt.grid(True)
plt.show()

Resources