I'm working on a telemetry system, and right now I would like to see each scatter point in my plot with each pair of coordinates through clicks.
My plot is a time series, so I'm having a hard time to display each date with datacursor. I'm currently using this line
plt.gca().fmt_xdata = matplotlib.dates.DateFormatter('%H:%M:%S')
Which certifies me that my X axis is date-based.
I have already tried like this:
datacursor(ax1, formatter = 'Valor medido : {y:.6f} às {x:.6f}'.format)
The output is ok for Y, but the date come out as a "epoch number", like "57990.011454".
After a little research, I can convert this number with:
matplotlib.dates.num2date(d).strftime('%H:%M:%S')
but I'm failing to put it all together to display in my cursor.
Thanks in advance!
formatter= accepts any function that returns a string. You could therefore write (code untested because you did not provide a Minimal, Complete, and Verifiable example )
def print_coords(**kwargs):
return 'Valor medido : {y:.6f} às {x:s}'.format(y=kwargs['y'],
x=matplotlib.dates.num2date(kwargs['x']).strftime('%H:%M:%S'))
datacursor(ax1, formatter=print_coords)
Related
I have a somewhat complex mathematical function for which I want to find the meeting point of two graphs. I went through some answers but couldn't get what I wanted. Code below
from scipy.special import gamma, factorial
mu=[0,1,2,3,4,5]
b_value=[]
N_0=8000
c=3.78
b=[9.922]*6 #straight line
for i in range (len(mu)):
b = ((3.6*(c**3)*3.14*1000*N_0*(10**-6)/12)*gamma(6.01+mu[i]))/(6*3.14*c*N_0*(10**- 4)*gamma(4.67+mu[i]))**((6.01+mu[i])/(4.67+mu[i]))
b_value.append(b)
plt.figure(570)
plt.scatter(mu,b_value,color='r',lw='2',label = 'b_theoretical',marker="o")
plt.plot(mu,b,':',color='b',lw='2',label = 'b')
Basically I want to find the value of their intersection.Scatter plot can be plotted as line as well.
Thanks in advance.
I am having a bit of an issue with pandas's rolling function and I'm not quite sure where I'm going wrong. If I mock up two test series of numbers:
df_index = pd.date_range(start='1990-01-01', end ='2010-01-01', freq='D')
test_df = pd.DataFrame(index=df_index)
test_df['Series1'] = np.random.randn(len(df_index))
test_df['Series2'] = np.random.randn(len(df_index))
Then it's easy to have a look at their rolling annual correlation:
test_df['Series1'].rolling(365).corr(test_df['Series2']).plot()
which produces:
All good so far. If I then try to do the same thing using a datetime offset:
test_df['Series1'].rolling('365D').corr(test_df['Series2']).plot()
I get a wildly different (and obviously wrong) result:
Is there something wrong with pandas or is there something wrong with me?
Thanks in advance for any light you can shed on this troubling conundrum.
It's very tricky, I think the behavior of window as int and offset is different:
New in version 0.19.0 are the ability to pass an offset (or
convertible) to a .rolling() method and have it produce variable sized
windows based on the passed time window. For each time point, this
includes all preceding values occurring within the indicated time
delta.
This can be particularly useful for a non-regular time frequency index.
You should checkout the doc of Time-aware Rolling.
r1 = test_df['Series1'].rolling(window=365) # has default `min_periods=365`
r2 = test_df['Series1'].rolling(window='365D') # has default `min_periods=1`
r3 = test_df['Series1'].rolling(window=365, min_periods=1)
r1.corr(test_df['Series2']).plot()
r2.corr(test_df['Series2']).plot()
r3.corr(test_df['Series2']).plot()
This code would produce similar shape of plots for r2.corr().plot() and r3.corr().plot(), but note that the calculation results still different: r2.corr(test_df['Series2']) == r3.corr(test_df['Series2']).
I think for regular time frequency index, you should just stick to r1.
This mainly because the result of two rolling 365 and 365D are different.
For example
sub = test_df.head()
sub['Series2'].rolling(2).sum()
Out[15]:
1990-01-01 NaN
1990-01-02 -0.355230
1990-01-03 0.844281
1990-01-04 2.515529
1990-01-05 1.508412
sub['Series2'].rolling('2D').sum()
Out[16]:
1990-01-01 -0.043692
1990-01-02 -0.355230
1990-01-03 0.844281
1990-01-04 2.515529
1990-01-05 1.508412
Since there are a lot NaN in rolling 365, so the corr of two series in two way are quit different.
If I have a table with three columns where the first column represents the name of each point, the second column represent numerical data (mean) and the last column represent (second column + fixed number). The following an example how is the data looks like:
I want to plot this table so I have the following figure
If it is possible how I can plot it using either Microsoft Excel or python or R (Bokeh).
Alright, I only know how to do it in ggplot2, I will answer regarding R here.
These method only works if the data-frame is in the format you provided above.
I rename your column to Name.of.Method, Mean, Mean.2.2
Preparation
Loading csv data into R
df <- read.csv('yourdata.csv', sep = ',')
Change column name (Do this if you don't want to change the code below or else you will need to go through each parameter to match your column names.
names(df) <- c("Name.of.Method", "Mean", "Mean.2.2")
Method 1 - Using geom_segment()
ggplot() +
geom_segment(data=df,aes(x = Mean,
y = Name.of.Method,
xend = Mean.2.2,
yend = Name.of.Method))
So as you can see, geom_segment allows us to specify the end position of the line (Hence, xend and yend)
However, it does not look similar to the image you have above.
The line shape seems to represent error bar. Therefore, ggplot provides us with an error bar function.
Method 2 - Using geom_errorbarh()
ggplot(df, aes(y = Name.of.Method, x = Mean)) +
geom_errorbarh(aes(xmin = Mean, xmax = Mean.2.2), linetype = 1, height = .2)
Usually we don't use this method just to draw a line. However, its functionality fits your requirement. You can see that we use xmin and ymin to specify the head and the tail of the line.
The height input is to adjust the height of the bar at the end of the line in both ends.
I would use hbar for this:
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("intervals.html")
names = ["SMB", "DB", "SB", "TB"]
p = figure(y_range=names, plot_height=350)
p.hbar(y=names, left=[4,3,2,1], right=[6.2, 5.2, 4.2, 3.2], height=0.3)
show(p)
However Whisker would also be an option if you really want whiskers instead of interval bars.
%Sampling Frequency
f=8000;
%Sampling Time
t=5;
%Data imported from microsoft Excel
matrix=Book2S1;
%Size Matrix
s=size(matrix);
h=s(1,1);
w=s(1,2);
%Set Up Rows and Columns
rows=(0:(f/2)/(h-1):f/2);
columns=(0:t/(w-1):t);
%plot
mesh(columns,rows,matrix);
xlabel('Time, s')
ylabel('Frequency, Hz')
zlabel('Power Spectral Density, V^2/Hz')`enter code here
This is the code that I type in to attempt to get a 3D plot. The goal is for me to obtain a plot that looks like the image listed below, but I continue getting a mesh error
Error using mesh (line 139)
Data inputs must be numeric, datetime, duration, categorical arrays or objects which can be converted to
double.
Error in Lab_3_1 (line 21)
mesh(columns,rows,matrix);
What my plot is supposed to look like.
The picture didn't want to get saved after being cropped, sorry people.
The following is a link to half of the data being used for this plot.
https://docs.google.com/spreadsheets/d/e/2PACX-1vRMWfmFYDnwMSPzahD8k-aWAXHstbNRdlY4gmOHJoXkLaBb4PY7zF5-41yFkQHR4g0w3LrMFiz3ZqWJ/pubhtml
Try substituting your 4049x50 matrix replacing my random matrix f:
% t=5;
% fs = 8000;
lower = -60;
upper = 20;
f = (upper-lower).*rand(4049,50) + lower;
% s=size(f);
% h=s(1,1);
% w=s(1,2);
% rows=(0:(fs/2)/(h-1):fs/2);
% columns=(0:t/(w-1):t);
mesh(f);
colormap('jet');
colorbar;
xlabel('Time, s')
ylabel('Frequency, Hz')
zlabel('Power Spectral Density V^2/Hz')
ylim([0 4000])
zlim([-100 40])
Using the random data matrix f, I get this:
I figured out that the values that were imported into MATLAB were converted into string values. I stopped using the import button and used the xlsread function instead, and that allowed me to import the numerical values without them being converted into strings.
Finished Code
Resulting 3D Plot
Thank you guys for the help and looking over the problem.
Background:
I'm working on a program to show a 2d cross section of 3d data. The data is stored in a simple text csv file in the format x, y, z1, z2, z3, etc. I take a start and end point and flick through the dataset (~110,000 lines) to create a line of points between these two locations, and dump them into an array. This works fine, and fairly quickly (takes about 0.3 seconds). To then display this line, I've been creating a matplotlib stacked bar chart. However, the total run time of the program is about 5.5 seconds. I've narrowed the bulk of it (3 seconds worth) down to the code below.
'values' is an array with the x, y and z values plus a leading identifier, which isn't used in this part of the code. The first plt.bar is plotting the bar sections, and the second is used to create an arbitrary floor of -2000. In order to generate a continuous looking section, I'm using an interval between each bar of zero.
import matplotlib.pyplot as plt
for values in crossSection:
prevNum = None
layerColour = None
if values != None:
for i in range(3, len(values)):
if values[i] != 'n':
num = float(values[i].strip())
if prevNum != None:
plt.bar(spacing, prevNum-num, width=interval, \
bottom=num, color=layerColour, \
edgecolor=None, linewidth=0)
prevNum = num
layerColour = layerParams[i].strip()
if prevNum != None:
plt.bar(spacing, prevNum+2000, width=interval, bottom=-2000, \
color=layerColour, linewidth=0)
spacing += interval
I'm sure there's a more efficient way to do this, but I'm new to Matplotlib and still unfamilar with its capabilities. The other main use of time in the code is:
plt.savefig('output.png')
which takes about a second, but I figure this is to be expected to save the file and I can't do anything about it.
Question:
Is there a faster way of generating the same output (a stacked bar chart or something that looks like one) by using plt.bar() better, or a different Matplotlib function?
EDIT:
I forgot to mention in the original post that I'm using Python 3.2.3 and Matplotlib 1.2.0
Leaving this here in case someone runs into the same problem...
While not exactly the same as using bar(), with a sufficiently large dataset (large enough that using bar() takes a few seconds) the results are indistinguishable from stackplot(). If I sort the data into layers using the method given by tcaswell and feed it into stackplot() the chart is created in 0.2 seconds, rather than 3 seconds.
EDIT
Code provided by tcaswell to turn the data into layers:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
It looks like you are drawing each bar, you can pass sequences to bar (see this example)
I think something like:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
ax = plt.gca()
spacing = interval*numpy.arange(len(accum_values[0]))
for data,color is zip(accum_values,layer_params):
ax.bar(spacing,data,bottom=bottom,color=color,linewidth=0,width=interval)
bottom += data
will be faster (because each call to bar creates one BarContainer and I suspect the source of your issues is you were creating one for each bar, instead of one for each layer).
I don't really understand what you are doing with the bars that have tops below their bottoms, so I didn't try to implement that, so you will have to adapt this a bit.