Too long processing time for a 3-d plot - python-3.x

I am comparatively new to python, so I am not able to assess if there is something wrong with my code or is the process taking too long to complete or anything else.
I wrote a code for plotting a large dataset (3d array) in a 3d plot, but my PC takes forever to complete (or not complete). I have been waiting for about one hour for it to complete nearly.
a = pd.DataFrame(np.array([Ensemble_test,df['RF'],y])).transpose()
a # is a dataset with dimentions 335516 rows × 3 columns
### All the 3 rows are numbers
Output:
0 1 2
0 172.981614 130.624674 -42.356940
1 189.851754 139.632304 -50.219450
## I tried plotting using following
from mpl_toolkits.mplot3d import Axes3D
df=a.unstack().reset_index()
df.columns=["X","Y","Z"]
df['X']=pd.Categorical(df['X'])
df['X']=df['X'].cat.codes
# Make the plot
fig = plt.figure(figsize = (8,8))
ax = fig.gca(projection='3d')
im = ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap='Spectral', linewidth=0.001, vmax = 30,
vmin = -30, antialiased=True)
ax.view_init(40,20)
#fig.colorbar(im, ax=ax, fraction = 0.023)
ax.set_ylabel('RD')
ax.set_zlabel('Difference')
ax.set_xlabel('Ensemble')
I wanted to have a 3-d plot but the process takes too long. I don't know what the problem is.
Any other alternatives/suggestions for 3-d plotting are also welcome.
[My PC is 8th gen 'i7' with '16 GB' RAM]

Related

Curve fitting for large datasets in Python

I have a very large set of data, ( around 100k points) and I want to fit a curve to this plot.
I tried the filters suggested by answers to another question, but that lead to overfitting.
I am using numpy and matplotlib as of now.
This is the type of scatter plot I am trying to fit.
Edit 1:
Please ignore the data points to the side of the central main set of data points(Thus only a single curve can fit this)
Here is the dataset, download the file as a text file to separate the columns, consider the columns 3 and 9 ( 1-based indexing), the y-axis has column 3 while the x-axis plots the difference of column 3 and column 9.
Edit 2: Ignore the negative values
Edit 3: As there appears to be a lot of noise, consider the column 33 which accounts for probability and consider stars only which have >90% probability
Here is are comparison scatterplots using the data in your link, along with the python code I used to read, parse, and plot the data. Note that my plot also has an inverted y axis for direct comparison. This shows me that the data in the posted link, parsed per your directions, cannot be fit as it is per your question. My hope is that you can find some error in my work, and a model can in fact be made.
import matplotlib.pyplot as plt
dataFileName = 'temp.dat'
dataCount = 0
xlist = []
ylist = []
with open(dataFileName) as f:
for line in f:
if line[0] == '#': # comments
continue
spl = line.split()
col3 = float(spl[2])
col9 = float(spl[8])
if col3 < 0.0 or col9 < 0.0:
continue
x = abs(col3 - col9)
y = col3
xlist.append(x)
ylist.append(y)
f = plt.figure()
axes = f.add_subplot(111)
axes.invert_yaxis()
axes.scatter(xlist, ylist,color='black', marker='o', lw=0, s=1)
plt.show()

Matplotlib RuntimeError: exceeds Locator.MAXTICKS when using MultipleLocator

I am plotting a Matplotlib chart with 10000 x axis data points. To avoid the X axis labels overlapping, I have used a Major MultipleLocator of 40 and a minor MultipleLocator of 10. This code works for 1000 data points.
from matplotlib import pyplot as plt
import numpy as np
import matplotlib.ticker as mticker
##generating 1000 data points
years = [i for i in range(1,10000)]
data = np.random.rand(len(years))
fig, ax = plt.subplots(figsize = (18,6))
ind = np.arange(len(data))
bars1 = ax.bar(ind, data,
label='Data')
ax.set_title("Data vs Year")
#Format Y Axis
ax.set_ylabel("Data")
ax.set_ylim((0,1))
#Format X Axis
ax.set_xticks(range(0,len(ind)))
ax.set_xticklabels(years)
ax.set_xlabel("Years")
ax.xaxis.set_major_locator(mticker.MultipleLocator(40))
ax.xaxis.set_major_formatter(mticker.FormatStrFormatter('%d'))
ax.xaxis.set_minor_locator(mticker.MultipleLocator(10))
fig.autofmt_xdate()
ax.xaxis_date()
plt.tight_layout()
plt.show()
This above chart produces the following error.
RuntimeError: Locator attempting to generate 1102 ticks from -510.0 to 10500.0: exceeds Locator.MAXTICKS
Can you please tell me the error in this chart?
First of all, you should remove these two lines:
ax.set_xticks(range(0,len(ind)))
ax.set_xticklabels(years)
These lines set 10000 ticks first. Since you used ax.xaxis.set_major/minor_locator(), these two lines are not needed. And then the line ax.xaxis.set_minor_locator(mticker.MultipleLocator(10)) will generate 1102 ticks (mticker.Locator.MAXTICKS==1000), so you should change the arg to at least 12 as a result of my testing.
Change arg of mticker.MultipleLocator() larger will get fewer ticks.
Despite any reason, if you do need 277 major ticks (40), and 1102 minor ticks (10), you can change the 'MAXTICKS' by mticker.Locator.MAXTICKS = 2000

Matplotlib - sequentially creating figures with the same size

I need to create a sequence of .pdf files where each .pdf contains a figure with five plots.
As I am going to include them in a LaTeX article, I wanted them all to be the same width and height so that each figure's corners are vertically aligned on both left and right sides.
I thought this would be enough, but apparently not:
common_figsize=(6,5)
fig, ax = plt.subplots(figsize = common_figsize)
# five plots in a loop for the first figure.
# my_code()...
plt.savefig("Figure-1.pdf", transparent=True)
plt.close(fig)
fig, ax = plt.subplots(figsize = common_figsize)
# five plots in a loop for the new figure.
# my_code()...
plt.savefig("Figure-2.pdf", transparent=True)
plt.close(fig)
If I understand correctly, this does not do exactly what I want because of different scales originating from different yticks resolutions.
For both figures, pyplot is fed the same list for xticks.
In this case, it is a list of 50 values, from 1 to 50.
CHUNK_COUNT = 50
x_step = CHUNK_COUNT / 10
new_xticks = list(range(x_step, CHUNK_COUNT + x_step, x_step)) + [1]
plt.xticks(new_xticks)
ax.set_xlim(left=1, right=CHUNK_COUNT)
This creates both figures with an X-axis that goes from 1 to 50.
So far so good.
However, I haven't figured out how to deal with the problem of yticks resolution.
One of the figures had less yticks than the other, so I overrode it to have as many ticks as the other:
# Add yticks to Figure 1.
y_divisor = 6
y_step = (100 - min_y_tick) / y_divisor
new_yticks = [min_y_tick + y_step * i for i in range(0, y_divisor + 1)]
plt.yticks(new_yticks)
This resulted in the following images:
(click on each to open in new tab to see that in fact the bounding square of each figure is different)
Figure 1:
Figure 2:
In summary, I believe matplotlib is accepting the figsize parameter, but then rearranges plot elements to accommodate for different tick values and text lengths.
Is it possible for it to operate in reverse? To change label and text rotations automagically so that the squares are absolutely the same length and height?
Apologies if this is a duplicate and thanks for the help.
EDIT:
Finally able to provide a minimal, complete and verifiable example.
Among the tests, I removed the custom yticks code and the problem still persists:
from matplotlib.lines import Line2D
import matplotlib.ticker as mtick
import matplotlib.pyplot as plt
from matplotlib import rc
# activate latex text rendering
rc('text', usetex=True)
from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})
CHUNK_COUNT = 50
common_figsize=(6,5)
plot_counter = 5
x_step = int(int(CHUNK_COUNT) / 10)
new_xticks = list(range(x_step, int(CHUNK_COUNT) + x_step, x_step)) + [1]
##### Plot Figure 1
fig, ax = plt.subplots(figsize = common_figsize)
plt.ylabel("Summary of a simple YY axis")
plt.yticks(rotation=45)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(is_latex=False))
for i in range(0, plot_counter):
xvals = range(1, CHUNK_COUNT + 1)
yvals = []
for j in xvals:
yvals.append(j + i)
plt.plot(xvals, yvals)
plt.xticks(new_xticks)
ax.set_xlim(left=1, right=int(CHUNK_COUNT))
plt.savefig("Figure_1.png", transparent=True)
plt.close(fig)
##### Plot Figure 2
fig, ax = plt.subplots(figsize = common_figsize)
plt.ylabel("Summary of another YY axis")
plt.yticks(rotation=45)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(is_latex=False))
for i in range(0, plot_counter):
xvals = range(1, CHUNK_COUNT + 1)
yvals = []
for j in xvals:
yvals.append((j + i) / 100)
plt.plot(xvals, yvals)
plt.xticks(new_xticks)
ax.set_xlim(left=1, right=int(CHUNK_COUNT))
plt.savefig("Figure_2.png", transparent=True)
plt.close(fig)
It turns out this was due to a mistake on my part.
I carried over code from another context where
autolayout
was active:
from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})
After setting it to False, the figure squares all had the same dimensions:
from matplotlib import rcParams
rcParams.update({'figure.autolayout': False})
Despite the length differences in ytick elements, it is now respecting the dimensions specified in my original question.
These results were generated with the MWE example I added at the end of my question:

Having trouble with multiple figures on pyplot

I am currently going through the Kaggle Titanic Machine Learning thing and using http://nbviewer.jupyter.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb to figure it out as I am a relative beginner to Python. I thought I understood what the first few steps were doing and I am trying to recreate an earlier step by making a figure with multiple plots on it. I can't seem to get the plots to actually show up.
Here is my code:
`
import pandas as pd
import numpy as np
import pylab as plt
train=pd.read_csv("train.csv")
#Set the global default size of matplotlib figures
plt.rc('figure', figsize=(10, 5))
#Size of matplotlib figures that contain subplots
figsize_with_subplots = (10, 10)
# Size of matplotlib histogram bins
bin_size = 10
females_df = train[train['Sex']== 'female']
print("females_df", females_df)
females_xt = pd.crosstab(females_df['Pclass'],train['Survived'])
females_xt_pct = females_xt.div(females_xt.sum(1).astype(float), axis = 0)
males = train[train['Sex'] == 'male']
males_xt = pd.crosstab(males['Pclass'], train['Survived'])
males_xt_pct= males_xt.div(males_xt.sum(1).astype(float), axis = 0)
plt.figure(5)
plt.subplot(221)
females_xt_pct.plot(kind='bar', title='Female Survival Rate by Pclass')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
plt.subplot(222)
males_xt_pct.plot(kind='bar', title= 'Male Survival Rate by Pclass')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
`
And this is displaying two blank plots separately (one in the 221 location, and then next plot on a new figure in the 222 location) and then another plot with males that actually works at the end. What am I doing wrong here?
In order to plot the pandas plot to apreviously created subplot, you may use the ax argument of the pandas plotting function.
ax=plt.subplot(..)
df.plot(..., ax=ax)
So in this case the code may look like
plt.figure(5)
ax=plt.subplot(221)
females_xt_pct.plot(kind='bar', title='Female Survival Rate by Pclass',ax=ax)
ax2=plt.subplot(222)
males_xt_pct.plot(kind='bar', title= 'Male Survival Rate by Pclass',ax=ax2)

Matplotlib sliding window not plotting correctly

I have a code that runs a rolling window (30) average over a range (i.e. 300)
So I have 10 averages but they plot against ticks 1-10 rather than spaced over every window of 30.
The only way I can get it to look right is to plot it over (len(windowlength)) but the x-axis isnt right.
Is there any way to manually space the results?
windows30 = (sliding_window(sequence, 30))
Overall_Mean = mean(sequence)
fig, (ax) = plt.subplots()
plt.subplots_adjust(left=0.07, bottom=0.08, right=0.96, top=0.92, wspace=0.20, hspace=0.23)
ax.set_ylabel('mean (%)')
ax.set_xlabel(' Length') # axis titles
ax.yaxis.grid(True, linestyle='-', which='major', color='lightgrey', alpha=0.5)
ax.plot(windows30, color='r', marker='o', markersize=3)
ax.plot([0, len(sequence)], [Overall_Mean, Overall_Mean], lw=0.75)
plt.show()
From what I have understood you have a list of length 300 but only holds 10 values inside. If that is the case, you can remove the other values that are None from your windows30 list using the following solution.
Code Demonstration:
import numpy as np
import random
import matplotlib.pyplot as plt
# Generating the list of Nones and numbers
listofzeroes = [None] * 290
numbers = random.sample(range(50), 10)
numbers.extend(listofzeroes)
# Removing Nones from the list
numbers = [value for value in numbers if value is not None]
step = len(numbers)
x_values = np.linspace(0,300,step) # Generate x-values
plt.plot(x_values,numbers, color='red', marker='o')
This is a working example, the relevant code for you is after the second comment.
Output:
The above code will work independently of where the Nones are located in your list. I hope this solves your problem.

Resources