Plotting list using matplot lib - python-3.x

I have a list like :
[[5, 1.1066720079718957], [10, 1.075297753414681], [15, 1.0958222358382397], [20, 1.092081009894558], [25, 1.0968130408510393]]
I am trying to plot using matplotlib
where values 5,10,15,20,25 are on x-axis where as 1.1066720079718957,1.075297753414681,1.0958222358382397,1.092081009894558,1.0968130408510393 on y-axis
I am not able to do it in few lines.

import matplotlib.pyplot as plt
data = [[5, 1.1066720079718957], [10, 1.075297753414681], [15, 1.0958222358382397], [20, 1.092081009894558], [25, 1.0968130408510393]]
plt.plot([i[0] for i in data], [i[1] for i in data])
plt.show()
Maybe these two links will also useful if you are working with nested lists and list comprehensions
Nested lists python
What does "list comprehension" mean? How does it work and how can I use it?

Related

Large rounding errors in python plots

I try to plot the following simple sequence
a_n=\frac{3^n+1}{7^n+8}
which should tend to 0, but the plot shows a weird effect for values of $n$ near 20....
I use the code
import numpy as np
import matplotlib.pyplot as plt
def f(n):
return (3**n+1)/(7**n+8)
n=np.arange(0,25, 1)
plt.plot(n,f(n),'bo-')
On the other hand, computing numerically the above sequence one does not find such large values
for i in range(0,25):
print([i,f(i)])
[0, 0.2222222222222222]
[1, 0.26666666666666666]
[2, 0.17543859649122806]
[3, 0.07977207977207977]
[4, 0.034039020340390205]
[5, 0.014510853404698185]
[6, 0.0062044757218015075]
[7, 0.0026567874970706124]
[8, 0.0011382857610720493]
[9, 0.00048778777316480816]
[10, 0.00020904485804220367]
[11, 8.958964415487241e-05]
[12, 3.8395417418579486e-05]
[13, 1.6455158259653074e-05]
[14, 7.05220773432529e-06]
[15, 3.022374322043928e-06]
[16, 1.295303220696569e-06]
[17, 5.551299431298911e-07]
[18, 2.3791283154177113e-07]
[19, 1.0196264191387531e-07]
[20, 4.3698275080881505e-08]
[21, 1.872783217393992e-08]
[22, 8.026213788319863e-09]
[23, 3.439805909206865e-09]
[24, 1.4742025325067883e-09]
​
Why is this happening?
The issue is not with matplotlib, but with the datatype of the numbers that arange is producing. You are not specifying the dtype, because in the docs for arange, it states that is inferred from the input. Your inputs are integers, so it must assume they are 32-bit integers since the dtype is unmodified so that when I check the type:
print(type(n[0]))
<class 'numpy.int32'>
If I change the dtype to single precision floats, we get the behavior you expect:
n = np.arange(0,25,1, dtype=np.float32)
print(type(n[0]))
<class 'numpy.float32'>
plt.plot(n,f(n),'bo-')
Alternatively, you could just put a period behind the 1 -> 1. to imply you want double-precision floats (even if the resulting array contains integer-esque numbers [0., 1., 2., ...])

Hiding matplotlib plots while doing tests with pytest

I am writing a simple library where, given a dataset, it runs a bunch of analyses and shows a lot of plots mid-way. (There are many plt.show() calls)
I have written simple tests to check if different functions run without any error with pytest.
The problem is, once I run pytest, it starts showing all of these plots, and I have to close one by one, it takes a lot of time.
How can I silence all the plots and just see if all the tests passed or not?
If your backend supports interactive display with plt.ion(), then you will need only minimal changes (four lines) to your code:
import matplotlib.pyplot as plt
#define a keyword whether the interactive mode should be turned on
show_kw=True #<--- added
#show_kw=False
if show_kw: #<--- added
plt.ion() #<--- added
#your usual script
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
plt.plot([1, 3, 7], [4, 6, -1])
plt.show()
plt.plot([1, 3, 7], [4, 6, -1])
plt.show()
#closes all figure windows if show_kw is True
#has no effect if no figure window is open
plt.close("all") #<--- added
print("finished")
However, if the plot generation is time-consuming, this will not be feasible as it only prevents that you have to close them one by one - they will still be generated. In this case, you can switch the backend to a non-GUI version that cannot display the figures:
import matplotlib.pyplot as plt
from matplotlib import get_backend
import warnings
show_kw=True
#show_kw=False
if show_kw:
curr_backend = get_backend()
#switch to non-Gui, preventing plots being displayed
plt.switch_backend("Agg")
#suppress UserWarning that agg cannot show plots
warnings.filterwarnings("ignore", "Matplotlib is currently using agg")
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
plt.plot([1, 3, 7], [4, 6, -1])
plt.show()
plt.plot([1, 3, 7], [4, 6, -1])
plt.show()
#display selectively some plots
if show_kw:
#restore backend
plt.switch_backend(curr_backend)
plt.plot([1, 2, 3], [-2, 5, -1])
plt.show()
print("finished")

plot data from two DataFrames with different dates

I'm trying to plot data from two dataframes in the same figure. The problem is that I'm using calendar dates for my x axis, and pandas apparently does not like this. The code below shows a minimum example of what I'm trying to do. There are two datasets with some numeric value associated with calendar dates. the data on the second data frame is posterior to the data on the first data frame. I wanted to plot them both in the same figure with appropriate dates and different line colors. the problem is that the pandas.DataFrame.plot method joins the starting date of both dataframes in the chart, thus rendering the visualization useless.
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'date': ['2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15'],
'number': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'date': ['2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19'],
'number': [7, 6, 5, 4]})
ax = df1.plot(x='date', y='number', label='beginning')
df2.plot(x='date', y='number', label='ending', ax=ax)
plt.show()
The figure created looks like this:
Is there any way I can fix this? Could I also get dates to be shown in the x-axis tilted so they're also more legible?
You need to cast 'date' to datetime dtype using pd.to_datetime:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'date': ['2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15'],
'number': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'date': ['2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19'],
'number': [7, 6, 5, 4]})
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])
ax = df1.plot(x='date', y='number', label='beginning')
df2.plot(x='date', y='number', label='ending', ax=ax)
plt.show()
Output:

Is there any way to plot lines of different lengths with bokeh?

Both down sampling and resizing are not feasible options for me, as suggested here.
I tried to pad the shorter lists with NaNs, but that threw up an error as well.
Is there any work around?
My code looks something like this:
from bokeh.charts import output_file, Line, save
lines=[[1,2,3],[1,2]]
output_file("example.html",title="toy code")
p = Line(lines,plot_width=600,plot_height=600, legend=False)
save(p)
However, as you see below you can plot two different lines with different lengths.
From Bokeh user guide on multiple lines:
from bokeh.plotting import figure, output_file, show
output_file("patch.html")
p = figure(plot_width=400, plot_height=400)
p.multi_line([[1, 3, 2], [3, 4, 6, 6]], [[2, 1, 4], [4, 7, 8, 5]],
color=["firebrick", "navy"], alpha=[0.8, 0.3], line_width=4)
show(p)

Get top-n items of every row in a scipy sparse matrix

After reading this similar question, I still can't fully understand how to go about implementing the solution im looking for. I have a sparse matrix, i.e.:
import numpy as np
from scipy import sparse
arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
arr_csc = sparse.csc_matrix(arr)
I would like to efficiently get the top n items of each row, without converting the sparse matrix to dense.
The end result should look like this (assuming n=2):
top_n_arr = np.array([[0,5,3,0,0],[6,0,0,9,0],[0,0,0,6,8]])
top_n_arr_csc = sparse.csc_matrix(top_n_arr)
What is wrong with the linked answer? Does it not work in your case? or you just don't understand it? Or it isn't efficient enough?
I was going to suggest working out a means of finding the top values for a row of an lil format matrix, and apply that row by row. But I would just be repeating my earlier answer.
OK, my previous answer was a start, but lacked some details on iterating through the lol format. Here's a start; it probably could be cleaned up.
Make the array, and a lil version:
In [42]: arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
In [43]: arr_sp=sparse.csc_matrix(arr)
In [44]: arr_ll=arr_sp.tolil()
The row function from the previous answer:
def max_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?
return top_values, top_indices, i
Iterate over the rows of arr_ll, apply this function and replace the elements:
In [46]: for i in range(arr_ll.shape[0]):
d,r=max_n(np.array(arr_ll.data[i]),np.array(arr_ll.rows[i]),2)[:2]
arr_ll.data[i]=d.tolist()
arr_ll.rows[i]=r.tolist()
....:
In [47]: arr_ll.data
Out[47]: array([[3, 5], [6, 9], [6, 8]], dtype=object)
In [48]: arr_ll.rows
Out[48]: array([[2, 1], [0, 3], [3, 4]], dtype=object)
In [49]: arr_ll.tocsc().A
Out[49]:
array([[0, 5, 3, 0, 0],
[6, 0, 0, 9, 0],
[0, 0, 0, 6, 8]])
In the lil format, the data is stored in 2 object type arrays, as sublists, one with the data numbers, the other with the column indices.
Viewing the data attributes of sparse matrix is handy when doing new things. Changing those attributes has some risk, since it mess up the whole array. But it looks like the lil format can be tweaked like this safely.
The csr format is better for accessing rows than csc. It's data is stored in 3 arrays, data, indices and indptr. The lil format effectively splits 2 of those arrays into sublists based on information in the indptr. csr is great for math (multiplication, addition etc), but not so good when changing the sparsity (turning nonzero values into zeros).

Resources