Coordinate conversion problem of a FITS file - python-3.x

I have loaded and plotted a FITS file in python.
With the help of a previous post, I have managed to get the conversion of the axis from pixels to celestial coordinates. But I can't manage to get them in milliarcseconds (mas) correctly.
The code is the following
import numpy as np
import matplotlib.pyplot as plt
import astropy.units as u
from astropy.wcs import WCS
from astropy.io import fits
from astropy.utils.data import get_pkg_data_filename
filename = get_pkg_data_filename('hallo.fits')
hdu = fits.open(filename)[0]
wcs = WCS(hdu.header).celestial
wcs.wcs.crval = [0,0]
plt.subplot(projection=wcs)
plt.imshow(hdu.data[0][0], origin='lower')
plt.xlim(200,800)
plt.ylim(200,800)
plt.xlabel('Relative R.A ()')
plt.ylabel('Relative Dec ()')
plt.colorbar()
The output looks like
The y-label is cut for some reason, I do not know.
As it was shown in another post, one could use
wcs.wcs.ctype = [ 'XOFFSET' , 'YOFFSET' ]
to switch it to milliarcsecond, and I get
but the scale is incorrect!.
For instance, 0deg00min00.02sec should be 20 mas and not 0.000002!
Did I miss something here?

Looks like a spectral index map. Nice!
I think the issue might be that FITS implicitly uses degrees for values like CDELT. And they should be converted to mas explicitly for the plot.
The most straightforward way is to multiply CDELT values by 3.6e6 to convert from degrees to mas.
However, there is a more general approach which could be useful if you want to convert to different units at some point:
import astropy.units as u
w.wcs.cdelt = (w.wcs.cdelt * u.deg).to(u.mas)
So it basically says first that the units of CDELT are degrees and then converts them to mas.
The whole workflow is like this:
def make_transform(f):
'''use already read-in FITS file object f to build pixel-to-mas transformation'''
print("Making a transformation out of a FITS header")
w = WCS(f[0].header)
w = w.celestial
w.wcs.crval = [0, 0]
w.wcs.ctype = [ 'XOFFSET' , 'YOFFSET' ]
w.wcs.cunit = ['mas' , 'mas']
w.wcs.cdelt = (w.wcs.cdelt * u.deg).to(u.mas)
print(w.world_axis_units)
return w
def read_fits(file):
'''read fits file into object'''
try:
res = fits.open(file)
return res
except:
return None
def start_plot(i,df=None, w=None, xlim = [None, None], ylim=[None, None]):
'''starts a plot and returns fig,ax .
xlim, ylim - axes limits in mas
'''
# make a transformation
# Using a dataframe
if df is not None:
w = make_transform_df(df)
# using a header
if w is not None:
pass
# not making but using one from the arg list
else:
w = make_transform(i)
# print('In start_plot using the following transformation:\n {}'.format(w))
fig = plt.figure()
if w.naxis == 4:
ax = plt.subplot(projection = w, slices = ('x', 'y', 0 ,0 ))
elif w.naxis == 2:
ax = plt.subplot(projection = w)
# convert xlim, ylim to coordinates of BLC and TRC, perform transformation, then return back to xlim, ylim in pixels
if any(xlim) and any(ylim):
xlim_pix, ylim_pix = limits_mas2pix(xlim, ylim, w)
ax.set_xlim(xlim_pix)
ax.set_ylim(ylim_pix)
fig.add_axes(ax) # note that the axes have to be explicitly added to the figure
return fig, ax
rm = read_fits(file)
wr = make_transform(rm)
fig, ax = start_plot(RM, w=wr, xlim = xlim, ylim = ylim)
Then just plot to the axes ax with imshow or contours or whatever.
Of course, this piece of code could be reduced to meet your particular needs.

Related

Is there a library that will help me fit data easily? I found fitter and i will provide the code but it shows some errors

So, here is my code:
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from fitter import Fitter, get_common_distributions
df = pd.read_csv("project3.csv")
bins = [282.33, 594.33, 906.33, 1281.33, 15030.33, 1842.33, 2154.33, 2466.33, 2778.33, 3090.33, 3402.33]
#declaring
facecolor = '#EAEAEA'
color_bars = '#3475D0'
txt_color1 = '#252525'
txt_color2 = '#004C74'
fig, ax = plt.subplots(1, figsize=(16, 6), facecolor=facecolor)
ax.set_facecolor(facecolor)
n, bins, patches = plt.hist(df.City1, color=color_bars, bins=10)
#grid
minor_locator = AutoMinorLocator(2)
plt.gca().xaxis.set_minor_locator(minor_locator)
plt.grid(which='minor', color=facecolor, lw = 0.5)
xticks = [(bins[idx+1] + value)/2 for idx, value in enumerate(bins[:-1])]
xticks_labels = [ "{:.0f}-{:.0f}".format(value, bins[idx+1]) for idx, value in enumerate(bins[:-1])]
plt.xticks(xticks, labels=xticks_labels, c=txt_color1, fontsize=13)
#beautify
ax.tick_params(axis='x', which='both',length=0)
plt.yticks([])
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for idx, value in enumerate(n):
if value > 0:
plt.text(xticks[idx], value+5, int(value), ha='center', fontsize=16, c=txt_color1)
plt.title('Histogram of rainfall in City1\n', loc = 'right', fontsize = 20, c=txt_color1)
plt.xlabel('\nCentimeters of rainfall', c=txt_color2, fontsize=14)
plt.ylabel('Frequency of occurrence', c=txt_color2, fontsize=14)
plt.tight_layout()
#plt.savefig('City1_Raw.png', facecolor=facecolor)
plt.show()
city1 = df['City1'].values
f = Fitter(city1, distributions=get_common_distributions())
f.fit()
fig = f.plot_pdf(names=None, Nbest=4, lw=1, method='sumsquare_error')
plt.show()
print(f.get_best(method = 'sumsquare_error'))
The issue is with the plots it shows. The first histogram it generates is
Next I get another graph with best fitted distributions which is
Then an output statement
{'chi2': {'df': 10.692966790090342, 'loc': 16.690849400411103, 'scale': 118.71595997157786}}
Process finished with exit code 0
I have a couple of questions. Why is chi2, the best fitted distribution not plotted on the graph?
How do I plot these distributions on top of the histograms and not separately? The hist() function in fitter library can do that but there I don't get to control the bins and so I end up getting like 100 bins with some flat looking data.
How do I solve this issue? I need to plot the best fit curve on the histogram that looks like image1. Can I use any other module/package to get the work done in similar way? This uses least squares fit but I am OK with least likelihood or log likelihood too.
Simple way of plotting things on top of each other (using some properties of the Fitter class)
import scipy.stats as st
import matplotlib.pyplot as plt
from fitter import Fitter, get_common_distributions
from scipy import stats
numberofpoints=50000
df = stats.norm.rvs( loc=1090, scale=500, size=numberofpoints)
fig, ax = plt.subplots(1, figsize=(16, 6))
n, bins, patches = ax.hist( df, bins=30, density=True)
f = Fitter(df, distributions=get_common_distributions())
f.fit()
errorlist = sorted(
[
[f._fitted_errors[dist], dist]
for dist in get_common_distributions()
]
)[:4]
for err, dist in errorlist:
ax.plot( f.x, f.fitted_pdf[dist] )
plt.show()
Using the histogram normalization, one would need to play with scaling to generalize again.

Pyplot: subsequent plots with a gradient of colours [duplicate]

I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
plt.plot(x,y)
plt.show()
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.
The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col = pl.cm.jet([0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
pl.figure()
pl.plot(x,y)
n = 20
colors = pl.cm.jet(np.linspace(0,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])
Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
Parameters
----------
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
Notes:
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
Returns
-------
lc : LineCollection instance.
"""
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
lc.set_array(np.asarray(c))
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
ax.add_collection(lc)
ax.autoscale()
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
Produces:
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
axcb.set_label('Y-intercept')
ax.set_title('Line Collection with mapped colors')
Produces:
Hope this helps!
An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[plt.cm.jet(i) for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)
If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = plt.cm.hsv(r) # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)
This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm = plt.cm.bwr(np.linspace(0, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
plt.show()
Resulting figure here

How to make the fluctuation range of three polylines all obvious in same figure by matplotlib? [duplicate]

I'm trying to create a plot using pyplot that has a discontinuous x-axis. The usual way this is drawn is that the axis will have something like this:
(values)----//----(later values)
where the // indicates that you're skipping everything between (values) and (later values).
I haven't been able to find any examples of this, so I'm wondering if it's even possible. I know you can join data over a discontinuity for, eg, financial data, but I'd like to make the jump in the axis more explicit. At the moment I'm just using subplots but I'd really like to have everything end up on the same graph in the end.
Paul's answer is a perfectly fine method of doing this.
However, if you don't want to make a custom transform, you can just use two subplots to create the same effect.
Rather than put together an example from scratch, there's an excellent example of this written by Paul Ivanov in the matplotlib examples (It's only in the current git tip, as it was only committed a few months ago. It's not on the webpage yet.).
This is just a simple modification of this example to have a discontinuous x-axis instead of the y-axis. (Which is why I'm making this post a CW)
Basically, you just do something like this:
import matplotlib.pylab as plt
import numpy as np
# If you're not familiar with np.r_, don't worry too much about this. It's just
# a series with points from 0 to 1 spaced at 0.1, and 9 to 10 with the same spacing.
x = np.r_[0:1:0.1, 9:10:0.1]
y = np.sin(x)
fig,(ax,ax2) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
ax.plot(x, y, 'bo')
ax2.plot(x, y, 'bo')
# zoom-in / limit the view to different portions of the data
ax.set_xlim(0,1) # most of the data
ax2.set_xlim(9,10) # outliers only
# hide the spines between ax and ax2
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax.yaxis.tick_left()
ax.tick_params(labeltop='off') # don't put tick labels at the top
ax2.yaxis.tick_right()
# Make the spacing between the two axes a bit smaller
plt.subplots_adjust(wspace=0.15)
plt.show()
To add the broken axis lines // effect, we can do this (again, modified from Paul Ivanov's example):
import matplotlib.pylab as plt
import numpy as np
# If you're not familiar with np.r_, don't worry too much about this. It's just
# a series with points from 0 to 1 spaced at 0.1, and 9 to 10 with the same spacing.
x = np.r_[0:1:0.1, 9:10:0.1]
y = np.sin(x)
fig,(ax,ax2) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
ax.plot(x, y, 'bo')
ax2.plot(x, y, 'bo')
# zoom-in / limit the view to different portions of the data
ax.set_xlim(0,1) # most of the data
ax2.set_xlim(9,10) # outliers only
# hide the spines between ax and ax2
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax.yaxis.tick_left()
ax.tick_params(labeltop='off') # don't put tick labels at the top
ax2.yaxis.tick_right()
# Make the spacing between the two axes a bit smaller
plt.subplots_adjust(wspace=0.15)
# This looks pretty good, and was fairly painless, but you can get that
# cut-out diagonal lines look with just a bit more work. The important
# thing to know here is that in axes coordinates, which are always
# between 0-1, spine endpoints are at these locations (0,0), (0,1),
# (1,0), and (1,1). Thus, we just need to put the diagonals in the
# appropriate corners of each of our axes, and so long as we use the
# right transform and disable clipping.
d = .015 # how big to make the diagonal lines in axes coordinates
# arguments to pass plot, just so we don't keep repeating them
kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
ax.plot((1-d,1+d),(-d,+d), **kwargs) # top-left diagonal
ax.plot((1-d,1+d),(1-d,1+d), **kwargs) # bottom-left diagonal
kwargs.update(transform=ax2.transAxes) # switch to the bottom axes
ax2.plot((-d,d),(-d,+d), **kwargs) # top-right diagonal
ax2.plot((-d,d),(1-d,1+d), **kwargs) # bottom-right diagonal
# What's cool about this is that now if we vary the distance between
# ax and ax2 via f.subplots_adjust(hspace=...) or plt.subplot_tool(),
# the diagonal lines will move accordingly, and stay right at the tips
# of the spines they are 'breaking'
plt.show()
I see many suggestions for this feature but no indication that it's been implemented. Here is a workable solution for the time-being. It applies a step-function transform to the x-axis. It's a lot of code, but it's fairly simple since most of it is boilerplate custom scale stuff. I have not added any graphics to indicate the location of the break, since that is a matter of style. Good luck finishing the job.
from matplotlib import pyplot as plt
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
import numpy as np
def CustomScaleFactory(l, u):
class CustomScale(mscale.ScaleBase):
name = 'custom'
def __init__(self, axis, **kwargs):
mscale.ScaleBase.__init__(self)
self.thresh = None #thresh
def get_transform(self):
return self.CustomTransform(self.thresh)
def set_default_locators_and_formatters(self, axis):
pass
class CustomTransform(mtransforms.Transform):
input_dims = 1
output_dims = 1
is_separable = True
lower = l
upper = u
def __init__(self, thresh):
mtransforms.Transform.__init__(self)
self.thresh = thresh
def transform(self, a):
aa = a.copy()
aa[a>self.lower] = a[a>self.lower]-(self.upper-self.lower)
aa[(a>self.lower)&(a<self.upper)] = self.lower
return aa
def inverted(self):
return CustomScale.InvertedCustomTransform(self.thresh)
class InvertedCustomTransform(mtransforms.Transform):
input_dims = 1
output_dims = 1
is_separable = True
lower = l
upper = u
def __init__(self, thresh):
mtransforms.Transform.__init__(self)
self.thresh = thresh
def transform(self, a):
aa = a.copy()
aa[a>self.lower] = a[a>self.lower]+(self.upper-self.lower)
return aa
def inverted(self):
return CustomScale.CustomTransform(self.thresh)
return CustomScale
mscale.register_scale(CustomScaleFactory(1.12, 8.88))
x = np.concatenate((np.linspace(0,1,10), np.linspace(9,10,10)))
xticks = np.concatenate((np.linspace(0,1,6), np.linspace(9,10,6)))
y = np.sin(x)
plt.plot(x, y, '.')
ax = plt.gca()
ax.set_xscale('custom')
ax.set_xticks(xticks)
plt.show()
Check the brokenaxes package:
import matplotlib.pyplot as plt
from brokenaxes import brokenaxes
import numpy as np
fig = plt.figure(figsize=(5,2))
bax = brokenaxes(
xlims=((0, .1), (.4, .7)),
ylims=((-1, .7), (.79, 1)),
hspace=.05
)
x = np.linspace(0, 1, 100)
bax.plot(x, np.sin(10 * x), label='sin')
bax.plot(x, np.cos(10 * x), label='cos')
bax.legend(loc=3)
bax.set_xlabel('time')
bax.set_ylabel('value')
A very simple hack is to
scatter plot rectangles over the axes' spines and
draw the "//" as text at that position.
Worked like a charm for me:
# FAKE BROKEN AXES
# plot a white rectangle on the x-axis-spine to "break" it
xpos = 10 # x position of the "break"
ypos = plt.gca().get_ylim()[0] # y position of the "break"
plt.scatter(xpos, ypos, color='white', marker='s', s=80, clip_on=False, zorder=100)
# draw "//" on the same place as text
plt.text(xpos, ymin-0.125, r'//', fontsize=label_size, zorder=101, horizontalalignment='center', verticalalignment='center')
Example Plot:
For those interested, I've expanded upon #Paul's answer and added it to the matplotlib wrapper proplot. It can do axis "jumps", "speedups", and "slowdowns".
There is no way currently to add "crosses" that indicate the discrete jump like in Joe's answer, but I plan to add this in the future. I also plan to add a default "tick locator" that sets sensible default tick locations depending on the CutoffScale arguments.
Adressing Frederick Nord's question how to enable parallel orientation of the diagonal "breaking" lines when using a gridspec with ratios unequal 1:1, the following changes based on the proposals of Paul Ivanov and Joe Kingtons may be helpful. Width ratio can be varied using variables n and m.
import matplotlib.pylab as plt
import numpy as np
import matplotlib.gridspec as gridspec
x = np.r_[0:1:0.1, 9:10:0.1]
y = np.sin(x)
n = 5; m = 1;
gs = gridspec.GridSpec(1,2, width_ratios = [n,m])
plt.figure(figsize=(10,8))
ax = plt.subplot(gs[0,0])
ax2 = plt.subplot(gs[0,1], sharey = ax)
plt.setp(ax2.get_yticklabels(), visible=False)
plt.subplots_adjust(wspace = 0.1)
ax.plot(x, y, 'bo')
ax2.plot(x, y, 'bo')
ax.set_xlim(0,1)
ax2.set_xlim(10,8)
# hide the spines between ax and ax2
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax.yaxis.tick_left()
ax.tick_params(labeltop='off') # don't put tick labels at the top
ax2.yaxis.tick_right()
d = .015 # how big to make the diagonal lines in axes coordinates
# arguments to pass plot, just so we don't keep repeating them
kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
on = (n+m)/n; om = (n+m)/m;
ax.plot((1-d*on,1+d*on),(-d,d), **kwargs) # bottom-left diagonal
ax.plot((1-d*on,1+d*on),(1-d,1+d), **kwargs) # top-left diagonal
kwargs.update(transform=ax2.transAxes) # switch to the bottom axes
ax2.plot((-d*om,d*om),(-d,d), **kwargs) # bottom-right diagonal
ax2.plot((-d*om,d*om),(1-d,1+d), **kwargs) # top-right diagonal
plt.show()
This is a hacky but pretty solution for x-axis breaks.
The solution is based on https://matplotlib.org/stable/gallery/subplots_axes_and_figures/broken_axis.html, which gets rid of the problem with positioning the break above the spine, solved by How can I plot points so they appear over top of the spines with matplotlib?
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
def axis_break(axis, xpos=[0.1, 0.125], slant=1.5):
d = slant # proportion of vertical to horizontal extent of the slanted line
anchor = (xpos[0], -1)
w = xpos[1] - xpos[0]
h = 1
kwargs = dict(marker=[(-1, -d), (1, d)], markersize=12, zorder=3,
linestyle="none", color='k', mec='k', mew=1, clip_on=False)
axis.add_patch(Rectangle(
anchor, w, h, fill=True, color="white",
transform=axis.transAxes, clip_on=False, zorder=3)
)
axis.plot(xpos, [0, 0], transform=axis.transAxes, **kwargs)
fig, ax = plt.subplots(1,1)
plt.plot(np.arange(10))
axis_break(ax, xpos=[0.1, 0.12], slant=1.5)
axis_break(ax, xpos=[0.3, 0.31], slant=-10)
if you want to replace an axis label, this would do the trick:
from matplotlib import ticker
def replace_pos_with_label(fig, pos, label, axis):
fig.canvas.draw() # this is needed to set up the x-ticks
labs = axis.get_xticklabels()
labels = []
locs = []
for text in labs:
x = text._x
lab = text._text
if x == pos:
lab = label
labels.append(lab)
locs.append(x)
axis.xaxis.set_major_locator(ticker.FixedLocator(locs))
axis.set_xticklabels(labels)
fig, ax = plt.subplots(1,1)
plt.plot(np.arange(10))
replace_pos_with_label(fig, 0, "-10", axis=ax)
replace_pos_with_label(fig, 6, "$10^{4}$", axis=ax)
axis_break(ax, xpos=[0.1, 0.12], slant=2)

Draw 3D Plot for Gensim model

I have trained my model using Gensim. I draw a 2D plot using PCA but it is not clear too much. I wanna change it to 3D with capable of zooming .my result is so dense.
from sklearn.decomposition import PCA
from matplotlib import pyplot
X=model[model.wv.vocab]
pca=PCA(n_components=2)
result=pca.fit_transform(X)
pyplot.scatter(result[:,0],result[:,1])
word=list(model.wv.most_similar('eden_lake'))
for i, word in enumerate(words):
pyplot.annotate(word, xy=(result[i, 0], result[i, 1]))
pyplot.show()
And the result:
it possible to do that?
The following function uses t-SNE instead of PCA for dimension reduction, but will generate a plot in two, three or both two and three dimensions (using subplots). Furthermore, it will color the topics for you so it's easier to distinguish them. Adding %matplotlib notebook to the start of a Jupyter notebook environment from anaconda will allow a 3d plot to be rotated and a 2d plot to be zoomed (don't do both versions at the same time with %matplotlib notebook).
The function is very long, with most of the code being for plot formatting, but produces a professional output.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import seaborn as sns
from gensim.models import LdaModel
from gensim import corpora
from sklearn.manifold import TSNE
# %matplotlib notebook # if in Jupyter for rotating and zooming
def LDA_tSNE_topics_vis(dimension='both',
corpus=None,
num_topics=10,
remove_3d_outliers=False,
save_png=False):
"""
Returns the outputs of an LDA model plotted using t-SNE (t-distributed Stochastic Neighbor Embedding)
Note: t-SNE reduces the dimensionality of a space such that similar points will be closer and dissimilar points farther
Parameters
----------
dimension : str (default=both)
The dimension that t-SNE should reduce the data to for visualization
Options: 2d, 3d, and both (a plot with two subplots)
corpus : list, list of lists
The tokenized and cleaned text corpus over which analysis should be done
num_topics : int (default=10)
The number of categories for LDA based approaches
remove_3d_outliers : bool (default=False)
Whether to remove outliers from a 3d plot
save_png : bool (default=False)
Whether to save the figure as a png
Returns
-------
A t-SNE lower dimensional representation of an LDA model's topics and their constituent members
"""
dirichlet_dict = corpora.Dictionary(corpus)
bow_corpus = [dirichlet_dict.doc2bow(text) for text in corpus]
dirichlet_model = LdaModel(corpus=bow_corpus,
id2word=dirichlet_dict,
num_topics=num_topics,
update_every=1,
chunksize=len(bow_corpus),
passes=10,
alpha='auto',
random_state=42) # set for testing
df_topic_coherences = pd.DataFrame(columns = ['topic_{}'.format(i) for i in range(num_topics)])
for i in range(len(bow_corpus)):
df_topic_coherences.loc[i] = [0] * num_topics
output = dirichlet_model.__getitem__(bow=bow_corpus[i], eps=0)
for j in range(len(output)):
topic_num = output[j][0]
coherence = output[j][1]
df_topic_coherences.iloc[i, topic_num] = coherence
for i in range(num_topics):
df_topic_coherences.iloc[:, i] = df_topic_coherences.iloc[:, i].astype('float64', copy=False)
df_topic_coherences['main_topic'] = df_topic_coherences.iloc[:, :num_topics].idxmax(axis=1)
if num_topics > 10:
# cubehelix better for more than 10 colors
colors = sns.color_palette("cubehelix", num_topics)
else:
# The default sns color palette
colors = sns.color_palette('deep', num_topics)
tsne_2 = None
tsne_3 = None
if dimension == 'both':
tsne_2 = TSNE(n_components=2, perplexity=40, n_iter=300)
tsne_3 = TSNE(n_components=3, perplexity=40, n_iter=300)
elif dimension == '2d':
tsne_2 = TSNE(n_components=2, perplexity=40, n_iter=300)
elif dimension == '3d':
tsne_3 = TSNE(n_components=3, perplexity=40, n_iter=300)
else:
ValueError("An invalid value has been passed to the 'dimension' argument - choose from 2d, 3d, or both.")
if tsne_2 is not None:
tsne_results_2 = tsne_2.fit_transform(df_topic_coherences.iloc[:, :num_topics])
df_tsne_2 = pd.DataFrame()
df_tsne_2['tsne-2d-d1'] = tsne_results_2[:,0]
df_tsne_2['tsne-2d-d2'] = tsne_results_2[:,1]
df_tsne_2['main_topic'] = df_topic_coherences.iloc[:, num_topics]
df_tsne_2['color'] = [colors[int(t.split('_')[1])] for t in df_tsne_2['main_topic']]
df_tsne_2['topic_num'] = [int(i.split('_')[1]) for i in df_tsne_2['main_topic']]
df_tsne_2 = df_tsne_2.sort_values(['topic_num'], ascending = True).drop('topic_num', axis=1)
if tsne_3 is not None:
colors = [c for c in sns.color_palette()]
tsne_results_3 = tsne_3.fit_transform(df_topic_coherences.iloc[:, :num_topics])
df_tsne_3 = pd.DataFrame()
df_tsne_3['tsne-3d-d1'] = tsne_results_3[:,0]
df_tsne_3['tsne-3d-d2'] = tsne_results_3[:,1]
df_tsne_3['tsne-3d-d3'] = tsne_results_3[:,2]
df_tsne_3['main_topic'] = df_topic_coherences.iloc[:, num_topics]
df_tsne_3['color'] = [colors[int(t.split('_')[1])] for t in df_tsne_3['main_topic']]
df_tsne_3['topic_num'] = [int(i.split('_')[1]) for i in df_tsne_3['main_topic']]
df_tsne_3 = df_tsne_3.sort_values(['topic_num'], ascending = True).drop('topic_num', axis=1)
if remove_3d_outliers:
# Remove those rows with values that are more than three standard deviations from the column mean
for col in ['tsne-3d-d1', 'tsne-3d-d2', 'tsne-3d-d3']:
df_tsne_3 = df_tsne_3[np.abs(df_tsne_3[col] - df_tsne_3[col].mean()) <= (3 * df_tsne_3[col].std())]
if tsne_2 is not None and tsne_3 is not None:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, # pylint: disable=unused-variable
figsize=(20,10))
ax1.axis('off')
else:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(20,10))
if tsne_2 is not None and tsne_3 is not None:
# Plot tsne_2, with tsne_3 being added later
ax1 = sns.scatterplot(data=df_tsne_2, x="tsne-2d-d1", y="tsne-2d-d2",
hue=df_topic_coherences.iloc[:, num_topics], alpha=0.3)
light_grey_tup = (242/256, 242/256, 242/256)
ax1.set_facecolor(light_grey_tup)
ax1.axes.set_title('t-SNE 2-Dimensional Representation', fontsize=25)
ax1.set_xlabel('tsne-d1', fontsize=20)
ax1.set_ylabel('tsne-d2', fontsize=20)
handles, labels = ax1.get_legend_handles_labels()
legend_order = list(np.argsort([i.split('_')[1] for i in labels]))
ax1.legend([handles[i] for i in legend_order], [labels[i] for i in legend_order],
facecolor=light_grey_tup)
elif tsne_2 is not None:
# Plot just tsne_2
ax = sns.scatterplot(data=df_tsne_2, x="tsne-2d-d1", y="tsne-2d-d2",
hue=df_topic_coherences.iloc[:, num_topics], alpha=0.3)
ax.set_facecolor(light_grey_tup)
ax.axes.set_title('t-SNE 2-Dimensional Representation', fontsize=25)
ax.set_xlabel('tsne-d1', fontsize=20)
ax.set_ylabel('tsne-d2', fontsize=20)
handles, labels = ax.get_legend_handles_labels()
legend_order = list(np.argsort([i.split('_')[1] for i in labels]))
ax.legend([handles[i] for i in legend_order], [labels[i] for i in legend_order],
facecolor=light_grey_tup)
if tsne_2 is not None and tsne_3 is not None:
# tsne_2 has been plotted, so add tsne_3
ax2 = fig.add_subplot(121, projection='3d')
ax2.scatter(xs=df_tsne_3['tsne-3d-d1'],
ys=df_tsne_3['tsne-3d-d2'],
zs=df_tsne_3['tsne-3d-d3'],
c=df_tsne_3['color'],
alpha=0.3)
ax2.set_facecolor('white')
ax2.axes.set_title('t-SNE 3-Dimensional Representation', fontsize=25)
ax2.set_xlabel('tsne-d1', fontsize=20)
ax2.set_ylabel('tsne-d2', fontsize=20)
ax2.set_zlabel('tsne-d3', fontsize=20)
with plt.rc_context({"lines.markeredgewidth" : 0}):
# Add handles via blank lines and order their colors to match tsne_2
proxy_handles = [Line2D([0], [0], linestyle="none", marker='o', markersize=8,
markerfacecolor=colors[i]) for i in legend_order]
ax2.legend(proxy_handles, ['topic_{}'.format(i) for i in range(num_topics)],
loc='upper left', facecolor=(light_grey_tup))
elif tsne_3 is not None:
# Plot just tsne_3
ax.axis('off')
ax.set_facecolor('white')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xs=df_tsne_3['tsne-3d-d1'],
ys=df_tsne_3['tsne-3d-d2'],
zs=df_tsne_3['tsne-3d-d3'],
c=df_tsne_3['color'],
alpha=0.3)
ax.set_facecolor('white')
ax.axes.set_title('t-SNE 3-Dimensional Representation', fontsize=25)
ax.set_xlabel('tsne-d1', fontsize=20)
ax.set_ylabel('tsne-d2', fontsize=20)
ax.set_zlabel('tsne-d3', fontsize=20)
with plt.rc_context({"lines.markeredgewidth" : 0}):
# Add handles via blank lines
proxy_handles = [Line2D([0], [0], linestyle="none", marker='o', markersize=8,
markerfacecolor=colors[i]) for i in range(len(colors))]
ax.legend(proxy_handles, ['topic_{}'.format(i) for i in range(num_topics)],
loc='upper left', facecolor=light_grey_tup)
if save_png:
plt.savefig('LDA_tSNE_{}.png'.format(time.strftime("%Y%m%d-%H%M%S")), bbox_inches='tight', dpi=500)
plt.show()
An example plot for both 2d and 3d (with outliers removed) representations of a 10 topic gensim LDA model on subplots would be:
Yes, in principle it is possible to do 3D visualization of LDA model results. Here is more information about using T-SNE for that.

Unable to plot circles on a map projection in basemap using Python

I'm trying to plot circles on a miller projection map using a center latitude, longitude and radius. I can't get the circles to show up on the map projection. I've tried plotting them using different techniques as shown in the links.
How to plot a circle in basemap or add artiste
How to make smooth circles on basemap projections
Here is my code:
def plot_notams(dict_of_filtered_notams):
''' Create a map of the US and plot all NOTAMS from a given time period.'''
'''Create the map'''
fig = plt.figure(figsize=(8,6), dpi=200)
ax = fig.add_subplot(111)
m = Basemap(projection='mill',llcrnrlat=20, urcrnrlat=55, llcrnrlon=-135, urcrnrlon=-60, resolution='h')
m.drawcoastlines()
m.drawcountries(linewidth=2)
m.drawstates()
m.fillcontinents(color='coral', lake_color='aqua')
m.drawmapboundary(fill_color='aqua')
m.drawmeridians(np.arange(-130, -65, 10), labels=[1,0,0,1], textcolor='black')
m.drawparallels(np.arange(20, 60, 5), labels=[1,0,0,1], textcolor='black')
''' Now add the NOTAMS to the map '''
notam_data = dict_of_filtered_notams['final_notam_list']
for line in notam_data:
notam_lat = float(line.split()[0])
notam_lon = float(line.split()[1])
coords = convert_coords(notam_lon, notam_lat)
notam_lon, notam_lat = coords[0], coords[1]
FL400_radius = np.radians(float(line.split()[2]))
x,y = m(notam_lon, notam_lat)
print("notam_lon = ",notam_lon, "notam_lat = ", notam_lat,"\n")
print("x,y values = ",'%.3f'%x,",",'%.3f'%y,"\n")
print("FL400_radius = ",('% 3.2f' % FL400_radius))
print("")
cir = plt.Circle((x,y), FL400_radius, color="white", fill=False)
ax.add_patch(cir)
(The convert_coords function is simply formatting the notam_lon/notam_lat values into a usable format as shown in the data below.)
Here is what my data looks like (you can see where it's being printed in the code above):
notam_lon = -117.7839 notam_lat = 39.6431
x,y values = 1914342.075 , 2398770.441
FL400_radius = 6.98
Here's an image of what my code above produces:
I also tried using the map.plot() function (specifically, m.plot(x,y, "o")) in place of "ax.add_patch(cir)." That worked but plotted points or "o's," of course. Here's the image produced by replacing "ax.add_patch(cir)" with "m.plot(x,y, "o")."
And as a final note, I'm using basemap 1.2.0-1 and matplotlib 3.0.3. I haven't found any indication that these versions are incompatible. Also, this inability to plot a circle wasn't an issue 2 months ago when I did this last. I'm at a loss here. I appreciate any feedback. Thank you.
To plot circles on a map, you need appropriate locations (x,y) and radius. Here is a working code and resulting plot.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
# make up 10 data points for location of circles
notam_lon = np.linspace(-117.7839, -100, 10)
notam_lat = np.linspace(39.6431, 52, 10)
# original radius of circle is too small
FL400_radius = 6.98 # what unit?
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
m = Basemap(projection='mill', llcrnrlat=20, urcrnrlat=55, llcrnrlon=-135, urcrnrlon=-60, resolution='l')
# radiusm = (m.ymax-m.ymin)/10. is good for check plot
radiusm = FL400_radius*10000 # meters, you adjust as needed here
for xi,yi in zip(notam_lon, notam_lat):
# xy=m(xi,yi): conversion (long,lat) to (x,y) on map
circle1 = plt.Circle(xy=m(xi,yi), radius=radiusm, \
edgecolor="blue", facecolor="yellow", zorder=10)
#ax.add_patch(circle1) # deprecated
ax.add_artist(circle1) # use this instead
m.drawcoastlines()
m.drawcountries(linewidth=2)
m.drawstates()
m.fillcontinents(color='coral', lake_color='aqua')
# m.drawmapboundary(fill_color='aqua') <-- causes deprecation warnings
# use this instead:
rect = plt.Rectangle((m.xmin,m.ymin), m.xmax-m.xmin, m.ymax-m.ymin, facecolor="aqua", zorder=-10)
ax.add_artist(rect)
m.drawmeridians(np.arange(-130, -65, 10), labels=[1,0,0,1], textcolor='black')
m.drawparallels(np.arange(20, 60, 5), labels=[1,0,0,1], textcolor='black')
plt.show()
The output map:
Hope this is useful.

Resources