Exception in `transform_non_affine` with log axis - python-3.x

I'm getting a weird error when I try to use axes.transData when plotting on a log scale. Minimal code to reproduce this error:
#!/usr/bin/env python3
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
fig = Figure(figsize=(8,6))
canvas = FigureCanvas(fig)
ax = fig.add_subplot(1,1,1)
ax.plot(range(10))
ax.set_yscale('log') # <--- works fine without this line
print(ax.transData.transform((1,1))) # <--- exception thrown here
canvas.print_figure('test.pdf')
The stack trace is as follows:
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 1273, in transform
return self.transform_affine(self.transform_non_affine(values))
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 2217, in transform_non_affine
return self._a.transform_non_affine(points)
File "/usr/local/lib/python3.3/site-packages/matplotlib-1.3.1-py3.3-linux-x86_64.egg/matplotlib/transforms.py", line 2002, in transform_non_affine
x_points = x.transform_non_affine(points)[:, 0:1]
TypeError: tuple indices must be integers, not tuple
If I comment out the set_yscale('log') it runs fine. Does anyone know why this transform doesn't work?

Not completely satisfying, but I found a workaround. The issue seems to be related to the 1 dimensional array input to transform. Oddly it works if I use this:
ax.transData.transform(pts[None,:])
In other words, I have to reshape the array make it 2 dimensional.

Related

Can I see all attributes of a pyplot without showing the graph?

I am working on developing homework as a TA for a course at my university.
We are using Otter Grader (an extension of OKPy) to grade student submissions of guided homework we provide through Jupyter Notebooks.
Students are being asked to plot horizontal lines on their plots using matplotlib.pyplot.axhline(), and I am hoping to use an assert call to determine whether they added the horizontal line to their plots.
Is there a way to see all attributes that have been added to a pyplot in matplotlib?
I don't believe there is a way to see if the axhline attribute has been used or not, but there is a way to see if the lines are horizontal by accessing all the line2D objects using the lines attribute.
import matplotlib.pyplot as plt
import numpy as np
def is_horizontal(line2d):
x, y = line2d.get_data()
y = np.array(y) # The axhline method does not return data as a numpy array
y_bool = y == y[0] # Returns a boolean array of True or False if the first number equals all the other numbers
return np.all(y_bool)
t = np.linspace(-10, 10, 1000)
plt.plot(t, t**2)
plt.plot(t, t)
plt.axhline(y=5, xmin=-10, xmax=10)
ax = plt.gca()
assert any(map(is_horizontal, ax.lines)), 'There are no horizontal lines on the plot.'
plt.show()
This code will raise the error if there is not at least one line2D object that contains data in which all the y values are the same.
Note that in order for the above to work, the axhline attribute has to be used instead of the hlines method. The hlines method does not add the line2D object to the axes object.

Scikit-learn Incremental PCA - ValueError: array must not contain infs or NaNs

I'm trying to use IncrementalPCA from scikit-learn. I really need the incremental version of the algorithm because of the online nature of my application. My code couldn't really be simpler:
from sklearn.decomposition import IncrementalPCA
import pandas as pd
with open('C:/My/File/Path/file.csv', 'r') as fp:
data = pd.read_csv(fp)
ipca = IncrementalPCA(n_components=4)
ipca.fit(data)
but this is how it finishes when launched:
C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py:293: RuntimeWarning: overflow encountered in long_scalars
np.sqrt((self.n_samples_seen_ * n_samples) /
C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py:293: RuntimeWarning: invalid value encountered in sqrt
np.sqrt((self.n_samples_seen_ * n_samples) /
Traceback (most recent call last):
File "C:/Users/myuser/AppData/Roaming/JetBrains/PyCharmCE2020.1/scratches/scratch_9.py", line 6, in <module>
ipca.fit(data)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py", line 215, in fit
self.partial_fit(X_batch, check_input=False)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py", line 298, in partial_fit
U, S, V = linalg.svd(X, full_matrices=False)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\scipy\linalg\decomp_svd.py", line 106, in svd
a1 = _asarray_validated(a, check_finite=check_finite)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\scipy\_lib\_util.py", line 263, in _asarray_validated
a = toarray(a)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\numpy\lib\function_base.py", line 498, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs
Process finished with exit code 1
My data is 243 columns of only 0s and 1s. I already checked:
There is no NaN anywhere in my data
There is no inf anywhere in my data
I had scikit-learn v0.22.2.post1, I updated to 0.23.1, no difference
If I use PCA instead of IncrementalPCA leaving everything else the same, everything works fine, no warnings, no errors, all good
There were similar issues in previous versions, but they refer to versions around 0.16/0.17, most were with more complex code and all were fixed around those versions
If anyone could help me I would be most grateful
Edit:
My data, exactly as I feed them to the above code
https://drive.google.com/file/d/1JBIliADt9TViTk8qjnmIS3RFEO934dY6/view?usp=sharing
Edit 2:
Tried using both
data = pd.read_csv(fp, dtype = 'Int64')
and
data = pd.read_csv(fp, dtype = np.float64)
with no difference in results.
Edit 3:
Seems like the issue is related with the dataset size. If I try fitting to a smaller portion everything works fine. This is until I get around 1800000 rows. That's where the error starts showing.
I issued this to scikit-learn and they got it fast. This is happening due to numpy array defaulting to int32 on Windows, which causes the RuntimeWarning at the top of the traceback and escalate into having NaNs passed to partial_fit(). I'm temporary moving to Linux waiting for it to be fixed.
Here for anyone having similar problems to track its resolution in future.
tl;dr: check above link to see if issue is resolved. If it is not, use a batch_size such as that batch_size * n_samples < 2^31 - 1. If that's not possible for you move to Linux.
Something is wrong with your data.
Here is an 100% working example using some artificial data (n=2000000 and d=243).
To help more, upload a sample of your data that results in the error.
from sklearn.decomposition import IncrementalPCA
import pandas as pd, numpy as np
n=2000000
d=243
data = pd.DataFrame(np.ones((n,d)))
ipca = IncrementalPCA(n_components=4)
ipca.fit(data.values)

Error while saving a matplotlib animation, missing 'dpi' argument

I'm trying to save an animation of matplotlib.animation.AnimationFunc and I get an error saying 'dpi' argument missing. Obviously, I have the dpi set so I don't understand where this error comes from.
I'm running python 3.6 and matplotlib 3.0.3, I also just installed ffmpeg from ubuntu official repositories (Ubuntu 18.04).
This is the part of my code that should affect that, although I think it should be something of the system:
Writer = writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800,)
ani = FuncAnimation(fig, anime, interval=time_step *
10**3, frames=F, repeat=False,)
ani.save('standard_map.mp4', writer=Writer, dpi=100)
The errors is:
with writer.saving(self._fig, filename, dpi):
File "/usr/lib/python3.6/contextlib.py", line 159, in helper
return _GeneratorContextManager(func, args, kwds)
File "/usr/lib/python3.6/contextlib.py", line 60, in __init__
self.gen = func(*args, **kwds) TypeError: saving() missing 1 required positional argument: 'dpi'
I tried both adding the lines they suggested there and the error stills the same.
plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'
I also tried changing the writer to 'imagemagick' the one set on Ubuntu by default and the error persists.
There's no dpi parameter passed to ani.save(), provide it:
ani.save('standard_map.mp4', writer=Writer, dpi=100)
This worked for me:
writer = animation.FFMpegFileWriter(fps=15, metadata=dict(artist='Me'), bitrate=1800)
For MACOS you need the blit=False.
I tried to reproduce the problem using the example from the matplotlib documentation:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import matplotlib.animation as animation
fig, ax = plt.subplots()
xdata, ydata = [], []
ln, = plt.plot([], [], 'ro')
def init():
ax.set_xlim(0, 2*np.pi)
ax.set_ylim(-1, 1)
return ln,
def update(frame):
xdata.append(frame)
ydata.append(np.sin(frame))
ln.set_data(xdata, ydata)
return ln,
writers = animation.writers
writer = writers['ffmpeg']
ani = FuncAnimation(fig, update, frames=np.linspace(0, 2*np.pi, 128),
init_func=init, blit=True)
ani.save("test.mp4",writer=writer)
This will indeed give you the exception from above:
TypeError: saving() missing 1 required positional argument: 'dpi'
There's an easy solution: You need an instance of the writer rather than the writer class itself. Thus replace
writer = writers['ffmpeg']
with
writer = writers['ffmpeg']()
From this guide.
To write to file, you should use the Agg backend for matplotlib.
Put the following at the top of your code.
matplotlib.use("Agg")

Stop x-axis labels from shrinking the plot in Matplotlib?

I'm trying to make a bar graph with the following code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
test = {'names':['a','b','abcdefghijklmnopqrstuvwxyz123456789012345678901234567890'], 'values':[1,2,3]}
df = pd.DataFrame(test)
plt.rcParams['figure.autolayout'] = False
ax = sns.barplot(x='names', y='values', data=df)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
plt.show()
But I get the following error because the long value in 'names' as a label on the x-axis is making the image shrink until the bottom is above the top.
Traceback (most recent call last):
File "C:/Users/Adam/.PyCharm2018.2/config/scratches/scratch.py", line 11, in <module>
plt.show()
File "C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 253, in show
return _show(*args, **kw)
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 25, in __call__
manager.show(**kwargs)
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 107, in show
self.canvas.show()
File "C:\Program Files\JetBrains\PyCharm 2018.2.3\helpers\pycharm_matplotlib_backend\backend_interagg.py", line 62, in show
self.figure.tight_layout()
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 2276, in tight_layout
self.subplots_adjust(**kwargs)
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 2088, in subplots_adjust
self.subplotpars.update(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 245, in update
raise ValueError('bottom cannot be >= top')
ValueError: bottom cannot be >= top
Here is what it looks like if I reduce the length of that name slightly:
How can I get it to expand the figure to fit the label instead of shrinking the axes?
One workaround is to create the Axes instance yourself as axes, not as subplot. Then tight_layout() has no effect, even if it's called internally. You can then pass the Axes with the ax keyword to sns.barplot. The problem now is that if you call plt.show() the label may be cut off, but if you call savefig with bbox_inches='tight', the figure size will be extended to contain both the figure and all labels:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
test = {'names':['a','b','abcdefghijklmnopqrstuvwxyz123456789012345678901234567890'], 'values':[1,2,3]}
df = pd.DataFrame(test)
#plt.rcParams['figure.autolayout'] = False
ax = sns.barplot(x='names', y='values', data=df, ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
#plt.show()
fig.savefig('long_label.png', bbox_inches='tight')
PROCLAIMER: I don't have pycharm, so there goes the assumption in this code, that matplotlib behaves the same with and without pycharm. Anyway, for me the outcome looks like this:
If you want this in an interactive backend I didn't find any other way than manually adjust the figure size. This is what I get using the qt5agg backend:
ax = sns.barplot(x='names', y='values', data=df)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.figure.set_size_inches(5, 8) # manually adjust figure size
plt.tight_layout() # automatically adjust elements inside the figure
plt.show()
Note that pycharm's scientific mode might be doing some magic that prevents this to work so you might need to deactivate it or just run the script outside pycharm.

Cant get the legend to show correctly on the chart

my legend is showing top right, but rather then stating AAPL and IBM it says one letter. cant figure out whats wrong
import quandl
import pandas as pd
import matplotlib.pyplot as plt
def get_mean_volume(symbol):
df = quandl.get("YAHOO/"+str(symbol))[::-1]
return df[['High', 'Adjusted Close']]
stock = ['AAPL', 'IBM']
for s in stock:
plt.plot(get_mean_volume(s))
plt.legend(s)
plt.ylabel('Price')
plt.xlabel('Date')
This is from the matplotlib.legend() documentation.
To make a legend for lines which already exist on the axes (via plot
for instance), simply call this function with an iterable of strings,
one for each legend item. For example:
plt.plot([1, 2, 3])
plt.legend(['A simple line'])
You should probably also add a plt.show().
So since you dont use any labels I think you should use:
plt.legend([s])
The error that you only see one letter is probably caused by the fact that legend iterates over the input (s="AAPL") and takes the first item (s[0]) for the label text for line 1 (s[0] is 'A').
For the second iteration of the loop the same happens for the 'I' (Because s[0]='I' in this case. s1 = 'B' and so on... )
legend() seems pretty customizable just check the matplotlib docs.
So this is the result for me:
import matplotlib.pyplot as plt
stock = ['AAPL']
for s in stock:
plt.plot([1,2,3])
plt.legend([s])
plt.ylabel('Price')
plt.xlabel('Date')
plt.show()
Results in:

Resources