pyplot - plot with a lot of arrow annotations is very slow - python-3.x

I have a code that generates plot using python pyplot. This plot is very fast with big amount of points but when I add as well a big amount of arrow annotations the plot is being very slow and each pan or zoom action takes a long time.
This is the line in the code where I add the annotations:
arrow = ax.annotate('', pointA,pointB ,annotation_clip=False,
arrowprops=dict(arrowstyle = 'simple'))
Any suggestions how can I accelerate the plot behavior?
Thanks

Related

Plotting glitch with matplotlib [python3]? - EDITED

I have some issues plotting "large" datasets of timeseries data in python, where the time jumps across a few decades in erroneous samples. We aim to visualise only the timestamp (unixtime + custom microseconds) vs index. In this example there are roughly 40k samples.
Basically, I am assuming it is some issue with the rendering of the plot by matplotlib, because when I move the axes, both the scatter points and also the lineplot seem to glitch all over the place. A further bit of evidence for this is that the line in the lineplot is not actually going through the markers, when I zoom in or pan the plot.
The timestamps are continuous and increase by 40ms between steps.
Overview of errors (timestamp is zero -> default date 1.1.1970)
Zoomed in on y axis
More zoomed in
Example of how the plot should look like
Timestamp raw data (ignore ms fraction 2)
Code used to plot (using google colab, re-created in Visual Studio Code)
if single_file_or_multiple == "multiple":
fig = px.line(df_trace, x=df_trace.index, y="time", markers=True,
color="rec_id")
fig.show()

networkx: node spacing when plotting multipartite graph

I want to plot a multiparite graph using networkx. However, when adding more nodes, the plot becomes very crowdy. Is there a way to have more space between nodes and partitions?
Looking at the documentation of multipartite_layout, I couldn't find parameters for this.
Of course, one could create complicated formulas for the positions, but since the spacing of multipartite_layout already looks so good for small graphs, I was how to scale this to bigger graphs.
Has anyone an idea how to do this (efficiently)?
Sample code, generating a graph with three partitions:
import matplotlib.pyplot as plt
import networkx as nx
# build graph:
G = nx.Graph()
for i in range (0,30):
G.add_node(i,layer=0)
for i in range (30,50):
G.add_node(i,layer=1)
for j in range(0,30):
G.add_edge(i,j)
G.add_node(100,layer=2)
G.add_edge(40,100)
# plot graph
pos = nx.multipartite_layout(G, subset_key="layer",)
plt.figure(figsize=(20, 8))
nx.draw(G, pos,with_labels=False)
plt.axis("equal")
plt.show()
The current, crowdy plot:
nx.multipartite_layout returns a dictionary with the following format: {node: array([x, y])}
I suggest you try pos = {p:array_op(pos[p]) for p in pos} where array_op is a function acting on the position of each node, array([x, y]).
In your case, I think a simple scaling along the x-axis suffice, i.e.
array_op = lambda x, sx: np.array(x[0]*sx, x[1]).
For visualization purpose I guess this should be equivalent with #JPM 's comment. However, this approach gives you the advantage of having the actual transformed position data.
In the end, if such uniform transformation does not satisfy your need, you can always manipulate the position manually with the knowledge of the format of the dict (although it might be less efficient).

Exchanging the axes in gnuplot

I have been wondering about this for a while, and it might already be implemented in gnuplot but I haven't been able to find info online.
When you have a data file, it is possible to exchange the axes and assign the "dummy variable", say x, (in gnuplot's help terminology) to the vertical axis:
plot "data" u 1:2 # x goes to horizontal axis, standard
plot "data" u 2:1 # x goes to vertical axis, exchanged axes
However, when you have a function, you need to resort to a parametric function to do this. Imagine you want to plot x = y² (as opposite to y = x²), then (as far as I know) you need to do:
set parametric
plot t**2,t
which works nicely in this case. I think however that a more flexible approach would be desirable, something like
plot x**2 axes y1x1 # this doesn't work!
Is something like the above implemented, or is there an easy way to use y as dummy variable without the need to set parametric?
So here is another ugly, but gnuplot-only variant: Use the special filename '+' to generate a dynamic data set for plotting:
plot '+' using ($1**2):1
The development version contains a new feature, which allows you to use dummy variables instead of column numbers for plotting with '+':
plot sample [y=-10:10] '+' using (y**2):(y)
I guess that's what come closest to your request.
From what I have seen, parametric plots are pretty common in order to achieve your needs.
If you really hate parametric plots and you have no fear for a VERY ugly solutions, I can give you my method...
My trick is to use a data file filled with a sequence of numbers. To fit your example, let's make a file sq with a sequence of reals from -10 to 10 :
seq -10 .5 10 > sq
And then you can do the magic you want using gnuplot :
plot 'sq' u ($1**2):($1)
And if you uses linux you can also put the command directly in the command line :
plot '< seq -10 .5 10' u ($1**2):($1)
I want to add that I'm not proud of this solution and I'd love the "axis y1x1" functionality too.
As far as I know there is no way to simply invert or exchange the axes in gnuplot when plotting a function.
The reason comes from the way functions are plotted in the normal plotting mode. There is a set of points at even intervals along the x axis which are sampled (frequency set by set samples) and the function value computed. This only allows for well-behaved functions; one y-value per x-value.

Drawing a straight line averaging a curve

I would like to draw a straight line that makes the average of a curve. I am plotting my data like that:
plot 'dataset' u 2:4 w p smooth bezier
My data consists of multiple columns and I would get something like that:
Any ideas of how to do it? I guess it is more an interpolation than an average. It is not relevant the ups and downs of the curve, and it would be much better to have a straight line interpolating the curve...
Using a straight line could be more or less easy to fit using fit however, how could I fit a curve that does not look like a well know curve? Let me show you an example? How could I fit a smooth curve among the main group of points? Please notice that there is some noise on the lower part of the graph that I wouldn't like to represent.
If you want to do some basic statistics on your data, gnuplot has a builtin command stats which may do what you want. Gnuplot offers some internal variables after plotting that contain data about min, max, etc. To see what these are, type show variables all after plotting your data.
Otherwise if you want to fit your data to a line, gnuplot does that as well:
f(x) = a*x + b
fit f(x) 'data.dat' using 2:4 via a,b
plot 'data.dat' using 2:4, f(x)

Histogram in logarithmic scale in gnuplot

I have to plot an histogram in logarithmic scale on both axis using gnuplot. I need bins to be equally spaced in log10. Using a logarithmic scale on the y axis isn't a problem. The main problem is creating the bin on the x axis. For example, using 10 bins in log10, first bins will be [1],[2],[3]....[10 - 19][20 - 29].....[100 190] and so on. I've searched on the net but I couldn't find any practical solution. If realizing it in gnuplot is too much complicated could you suggest some other software/language to do it?
As someone asked I will explain more specifically what I need to do. I have a (huge) list like this:
1 14000000
2 7000000
3 6500000
.
.
.
.
6600 1
8900 1
15000 1
19000 1
It shows, for example, that 14 milions of ip addresses have sent 1 packet, 7 milions 2 packets.... 1 ip address have sent 6600 packets, ... , 1 ip address have sent 19000 packets. As you can see the values on both axes are pretty high so I cannot plot it without a logarithmic scale.
The first things I tried because I needed to do it fast was plotting this list as it is with gnuplot setting logscale on both axes using boxes. The result is understandable but not too appropriate. In fact, the boxes became more and more thin going right on the x axis because, obviously, there are more points in 10-100 than in 1-10! So it became a real mess after the second decade.
I tried plotting a histogram with both axis being logarithmically scaled and gnuplot through the error
Log scale on X is incompatible with histogram plots.
So it appears that gnuplot does not support a log scale on the x axis with histograms.
Plotting in log-log scale in GnuPlot is perfectly doable contrary to the other post in this thread.
One can set the log-log scale in GnuPlot with the command set logscale.
Then, the assumption is that we have a file with positive (strictly non-zero) values both in the x-axis, as well as the y-axis. For example, the following file is a valid file:
1 0.5
2 0.2
3 0.15
4 0.05
After setting the log-log scale one can plot the file with the command:
plot "file.txt" w p where of course file.txt is the name of the file. This command will generate the output with points.
Note also that plotting boxes is tricky and is probably not recommended. One first has to restrict the x-range with a command of the form set xrange [1:4] and only then plot with boxes. Otherwise, when the x-range is undefined an error is returned. I am assuming that in this case plot requires (for appropriate x-values) some boxes to have size log(0), which of course is undefined and hence the error is returned.
Hope it is clear and it will also help others.
Have you tried Matplotlib with Python? Matplotlib is a really nice plotting library and when used with Python's simple syntax, you can plot things quite easily:
import matplotlib.pyplot as plot
figure = plot.figure()
axis = figure.add_subplot(1 ,1, 1)
axis.set_yscale('log')
# Rest of plotting code

Resources