Why use overlapping distribution plots in python - python-3.x

My current problem is that I want to know how many likes a user receives before he/she stops opening his/her notifications.
I have tried plotting this in python using google colab. However, when using a scatter plot, no number or pattern emerges. I also tried plotting it using 2 distribution plots, with an overlap.
python code for scatter plot
plot_data = [
go.Scatter(
x=merge3['user_id'],
y=merge3['notifopentotal'],
name = 'Opened Notification')]
python code for distribution plot
enable_plotly_in_cell()
Add histogram data
x1 = merge3['totalLikes']
x2 = merge3['notifopentotal']
Group data together
hist_data = [x1, x2]
group_labels = ['totalLikes', 'notifopentotal']
Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=10)
pyoff.iplot(fig)
The scatter plot resulted to disorganized lines. I was expecting to see a point of diversion, wherein the point is the number of likes a user receives before he/she stops opening her notifications.
The 2 distribution plots for likes and opened notifications overlapped in some areas and there was a point in their density curves that overlapped.
Is it safe to assume that this is the answer?

Related

A problem trying to make a 3D animation with gnuplot

I'm trying to implement a code with gnuplot in order to make an animation in 3D to see the evolution of one planet system that I already have its data in a .txt document.
Here's a part of the document. Each column presents the different coordinates of the particles and lines are also coordinates but in different times.
here is an example of the data
I know that I loop like this :
do for [i=0:200]{
plot "Sky".i.".txt" u 1:4
}
but this does not help, because I have all data in different lines, which each one of them presents the positions of the particles in different time.
I would be very thankful if anyone knows how to do that loop but reiterating with lines.

Plotting glitch with matplotlib [python3]? - EDITED

I have some issues plotting "large" datasets of timeseries data in python, where the time jumps across a few decades in erroneous samples. We aim to visualise only the timestamp (unixtime + custom microseconds) vs index. In this example there are roughly 40k samples.
Basically, I am assuming it is some issue with the rendering of the plot by matplotlib, because when I move the axes, both the scatter points and also the lineplot seem to glitch all over the place. A further bit of evidence for this is that the line in the lineplot is not actually going through the markers, when I zoom in or pan the plot.
The timestamps are continuous and increase by 40ms between steps.
Overview of errors (timestamp is zero -> default date 1.1.1970)
Zoomed in on y axis
More zoomed in
Example of how the plot should look like
Timestamp raw data (ignore ms fraction 2)
Code used to plot (using google colab, re-created in Visual Studio Code)
if single_file_or_multiple == "multiple":
fig = px.line(df_trace, x=df_trace.index, y="time", markers=True,
color="rec_id")
fig.show()

Pyplot is too slow when plotting point by point

I have too many points(x,y) (about 280million) to store in a list of x and a list of y and then feed to pyplot.scatter().
So I thought I could send x,y point by point. But pyplot.scatter is very slow to process this.
Are there any alternatives? My end game is to save the graph of points as an image.

Very basic pyqtgraph histogram

I'm VERY new to Python (been working in it for only a day and with no previous programming experience), and for a project I'm looking to make a histogram within a dedicated window in an existing GUI. At this point I'm just looking for it to contain randomly generated data. I'm using pyqtgraph, numpy, and Python 3.
I've been trying to rework this code which i got to work and creates a scatter plot with random data and places it within my GUI window:
def upd_lowerplot1(self):
""" Function to update bottom plot 1. """
# Clear Plot
self.ui.ui_botplot1.clear()
# Now generate the plot
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
self.ui.ui_botplot1.plot(x, y, pen=None, symbol='o')
I haven't found any examples on here or elsewhere that make much sense to me. If anyone could walk me through how to alter this code in baby steps that would be fantastic, trying to learn, not just get an answer with no understanding.

Histogram in logarithmic scale in gnuplot

I have to plot an histogram in logarithmic scale on both axis using gnuplot. I need bins to be equally spaced in log10. Using a logarithmic scale on the y axis isn't a problem. The main problem is creating the bin on the x axis. For example, using 10 bins in log10, first bins will be [1],[2],[3]....[10 - 19][20 - 29].....[100 190] and so on. I've searched on the net but I couldn't find any practical solution. If realizing it in gnuplot is too much complicated could you suggest some other software/language to do it?
As someone asked I will explain more specifically what I need to do. I have a (huge) list like this:
1 14000000
2 7000000
3 6500000
.
.
.
.
6600 1
8900 1
15000 1
19000 1
It shows, for example, that 14 milions of ip addresses have sent 1 packet, 7 milions 2 packets.... 1 ip address have sent 6600 packets, ... , 1 ip address have sent 19000 packets. As you can see the values on both axes are pretty high so I cannot plot it without a logarithmic scale.
The first things I tried because I needed to do it fast was plotting this list as it is with gnuplot setting logscale on both axes using boxes. The result is understandable but not too appropriate. In fact, the boxes became more and more thin going right on the x axis because, obviously, there are more points in 10-100 than in 1-10! So it became a real mess after the second decade.
I tried plotting a histogram with both axis being logarithmically scaled and gnuplot through the error
Log scale on X is incompatible with histogram plots.
So it appears that gnuplot does not support a log scale on the x axis with histograms.
Plotting in log-log scale in GnuPlot is perfectly doable contrary to the other post in this thread.
One can set the log-log scale in GnuPlot with the command set logscale.
Then, the assumption is that we have a file with positive (strictly non-zero) values both in the x-axis, as well as the y-axis. For example, the following file is a valid file:
1 0.5
2 0.2
3 0.15
4 0.05
After setting the log-log scale one can plot the file with the command:
plot "file.txt" w p where of course file.txt is the name of the file. This command will generate the output with points.
Note also that plotting boxes is tricky and is probably not recommended. One first has to restrict the x-range with a command of the form set xrange [1:4] and only then plot with boxes. Otherwise, when the x-range is undefined an error is returned. I am assuming that in this case plot requires (for appropriate x-values) some boxes to have size log(0), which of course is undefined and hence the error is returned.
Hope it is clear and it will also help others.
Have you tried Matplotlib with Python? Matplotlib is a really nice plotting library and when used with Python's simple syntax, you can plot things quite easily:
import matplotlib.pyplot as plot
figure = plot.figure()
axis = figure.add_subplot(1 ,1, 1)
axis.set_yscale('log')
# Rest of plotting code

Resources