With the dataset below i am trying to plot a line graph on matplotlib. I am trying to make a function that looks at the previous number and checks whether the current number is higher. If the current function is bigger it would draw a blue line going to the next point such as it would draw a blue line between (1,100) and (2,9313). If its not greater (6,203542) and (7,203542), a red line would be drawn.
import matplotlib.pyplot as plt
x_long = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
L_Amount_list = [100.00, 9313.38, 43601.28, 61701.69, 74331.88, 198913.81, 153054.54, 119162.10, 74382.25, 203542.82, 160774.71, 220307.19, 366459.26]
plt.plot(x_long,L_Amount_list, color = 'green')
First, create a list of line colors for the graph with thresholds. Next, instead of drawing the graph one at a time, extract the data two at a time and set the list of colors.
import matplotlib.pyplot as plt
x_long = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
L_Amount_list = [100.00, 9313.38, 43601.28, 61701.69, 74331.88, 198913.81, 153054.54, 119162.10, 74382.25, 203542.82, 160774.71, 220307.19, 366459.26]
colors = ['b' if a < b else 'r' for a,b in zip(L_Amount_list,L_Amount_list[1:])]
for i in range(len(x_long)):
try:
plt.plot(x_long[i:i+2], L_Amount_list[i:i+2], color=colors[i])
except:
break
plt.show()
Related
I am using matplotlib.pyplot to make a histogram. Due to the distribution of the data, I want manually set up the bins. The details are as follows:
Any value = 0 in one bin;
Any value > 60 in the last bin;
Any value > 0 and <= 60 are in between the bins described above and the bin size is 5.
Could you please give me some help? Thank you.
I'm not sure what you mean by "the bin size is 5". You can either plot a histogramm by specifying the bins with a sequence:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here
plt.hist(data, bins=[0, 0.5, 60, max(data)])
plt.show()
But the bin size will match the corresponding interval, meaning -in this example- that the "0-case" will be barely visible:
(Note that 60 is moved to the last bin when specifying bins as a sequence, changing the sequence to [0, 0.5, 59.5, max(data)] would fix that)
What you (probably) need is first to categorize your data and then plot a bar chart of the categories:
import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here
df = pd.DataFrame()
df['data'] = data
def find_cat(x):
if x == 0:
return "0"
elif x > 60:
return "> 60"
elif x > 0:
return "> 0 and <= 60"
df['category'] = df['data'].apply(find_cat)
df.groupby('category', as_index=False).count().plot.bar(x='category', y='data', rot=0, width=0.8)
plt.show()
Output:
building off Tranbi's answer, you could specify the bin edges as detailed in the link they shared.
import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -6] # your data here
df = pd.DataFrame()
df['data'] = data
bin_edges = [-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
bin_edges_offset = [x+0.000001 for x in bin_edges]
plt.figure()
plt.hist(df['data'], bins=bin_edges_offset)
plt.show()
histogram
IIUC you want a classic histogram for value between 0 (not included) and 60 (included) and add two bins for 0 and >60 on the side.
In that case I would recommend plotting the 3 regions separately:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here
fig, axes = plt.subplots(1,3, sharey=True, width_ratios=[1, 12, 1])
fig.subplots_adjust(wspace=0)
# counting 0 values and drawing a bar between -5 and 0
axes[0].bar(-5, data.count(0), width=5, align='edge')
axes[0].xaxis.set_visible(False)
axes[0].spines['right'].set_visible(False)
axes[0].set_xlim((-5, 0))
# histogram between (0, 60]
axes[1].hist(data, bins=12, range=(0.0001, 60.0001))
axes[1].yaxis.set_visible(False)
axes[1].spines['left'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].set_xlim((0, 60))
# counting values > 60 and drawing a bar between 60 and 65
axes[2].bar(60, len([x for x in data if x > 60]), width=5, align='edge')
axes[2].xaxis.set_visible(False)
axes[2].yaxis.set_visible(False)
axes[2].spines['left'].set_visible(False)
axes[2].set_xlim((60, 65))
plt.show()
Output:
Edit: If you wanna plot probability density, I would edit the data and simply use hist:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here
data2 = []
for el in data:
if el < 0:
pass
elif el > 60:
data2.append(61)
else:
data2.append(el)
plt.hist(data2, bins=14, density=True, range=(-4.99,65.01))
plt.show()
My legend now shows,
I want to add my label in legend, from 0 to 7, but I don't want to add a for-loop in my code and correct each label step by step, my code like that,
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:,0],
pca_2_spec[:,1],
s = 7,
marker='o',
c = pred_pca_2_spec,
cmap= 'rainbow')
ax.legend(*points.legend_elements(), title = 'cluster')
plt.show()
Assuming pred_pca_2_spec is some np.array with values [0, 5, 10, 15, 20, 30, 35] to change the values of these to be in the range 0-7, simply divide (each element) by 5.
Sample Data:
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(54)
pca_2_spec = np.random.randint(-100, 300, (100, 2))
pred_pca_2_spec = np.random.choice([0, 5, 10, 15, 20, 25, 30, 35], 100)
Plotting Code:
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:, 0],
pca_2_spec[:, 1],
s=7,
marker='o',
c=pred_pca_2_spec / 5, # Divide By 5
cmap='rainbow')
ax.legend(*points.legend_elements(), title='cluster')
plt.show()
I need to autoscale the y-axis on my bargraph in matplotlib in order to display the small differences in values. The reason why it needs to be autoscaled instead of having a fixed limit is because the values will change depending on what the user inputs. I've tried yscale log, but that doesn't work for negative values. I've tried symlog, but the graph stays the same. This is my current code:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 710, 1)
fig, ax = plt.subplots()
ax.bar(x, y)
plt.show()
Plots are automatically scaled for the full range of the data provided to the API.
For a bar plot, the best option to display the differences in the values of the bars, is probably to set the ylim for vertical bars or xlim for horizontal bars.
negative data
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(-700, -750, -5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
positive data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 750, 5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
mixed data
If the data has a wide range of positive and negative values, there's probably not a good option, as you've noted symlog doesn't help the issue.
The best option may be to plot the positive and negative data separately.
Creating a mask does't work with a list, so convert the lists to numpy arrays.
import numpy as np
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [700, -700, 710, -710, 720, -720, 730, -730, 740, -740]
x = np.array(x)
y = np.array(y)
mask = y >= 0 # positive mask
pos_y = y[mask] # get the positive values
neg_y = y[~mask] # get the negative values; ~ is not
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
ax1.bar(x[mask], pos_y) # also mask x to plot the bar at the correct x-tick
ax1.set_title('Positive Values')
ax1.set_ylim(min(pos_y), max(pos_y))
ax1.set_xticks(range(0, 12)) # buffer the number of x-ticks, so the x-ticks of the two plots align.
ax2.bar(x[~mask], neg_y)
ax2.set_title('Negative Values')
ax2.set_ylim(min(neg_y), max(neg_y))
ax2.set_xticks(range(0, 12))
plt.tight_layout() # better spacing between the two plots
I am trying to plot the line for a set of points. Currently, I have set of points as Column names X, Y and Type in the form of a data frame. Whenever the type is 1, I would like to plot the points as dashed and whenever the type is 2, I would like to plot the points as a solid line.
Currently, I am using for loop to iterate over all points and plot each point using plt.dash. However, this is slowing down my run time since I want to plot more than 40000 points.
So, is an easy way to plot the line overall points with different line dash type?
You could realize it by drawing multiple line segments like this
(Bokeh v1.1.0)
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, Range1d, LinearAxis
line_style = {1: 'solid', 2: 'dashed'}
data = {'name': [1, 1, 1, 2, 2, 2, 1, 1, 1, 1],
'counter': [1, 2, 3, 3, 4, 5, 5, 6, 7, 8],
'score': [150, 150, 150, 150, 150, 150, 150, 150, 150, 150],
'age': [20, 21, 22, 22, 23, 24, 24, 25, 26, 27]}
df = pd.DataFrame(data)
plot = figure(y_range = (100, 200))
plot.extra_y_ranges = {"Age": Range1d(19, 28)}
plot.add_layout(LinearAxis(y_range_name = "Age"), 'right')
for i, g in df.groupby([(df.name != df.name.shift()).cumsum()]):
source = ColumnDataSource(g)
plot.line(x = 'counter', y = 'score', line_dash = line_style[g.name.unique()[0]], source = source)
plot.circle(x = 'counter', y = 'age', color = "blue", size = 10, y_range_name = "Age", source = source)
show(plot)
Graph 1:
Adjacency list:
2: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14]
3: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14]
5: [2, 3, 4, 5, 6, 7, 8, 9]
Plot:
`import networkx as nx
G = nx.Graph()
G1 = nx.Graph()
import matplotlib.pyplot as plt
for i, j in adj_list.items():
for k in j:
G.add_edge(i, k)
pos = nx.spring_layout(G)
nx.draw(G, with_labels=True, node_size = 1000, font_size=20)
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show() `
Graph 1
In graph 2, I am eliminating a few edges and replotting the graph, but the position of nodes is changing, how to store the position of nodes for the next graph?
You need to re-use your pos variable while plotting the graph. nx.spring_layout returns a dictionary where the node id is the key and the values are the x,y co-ordinates of the node to be plotted. Just reuse the same pos variable and pass it as an attribute to nx.draw function like this
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G1 = nx.Graph()
adj_list = {2: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14],
3: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14],
5: [2, 3, 4, 5, 6, 7, 8, 9]}
import matplotlib.pyplot as plt
for i, j in adj_list.items():
for k in j:
G.add_edge(i, k)
pos = nx.spring_layout(G) #<<<<<<<<<< Initialize this only once
nx.draw(G,pos=pos, with_labels=True, node_size = 1000, font_size=20) #<<<<<<<<< pass the pos variable
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show()
Now I will create a new graph and add only half the edges
cnt = 0
G = nx.Graph()
for i, j in adj_list.items():
for k in j:
cnt+=1
if cnt%2 == 0:
continue
G.add_edge(i, k)
nx.draw(G,pos=pos, with_labels=True, node_size = 1000, font_size=20) #<-- same pos variable is used
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show()
As you can see only half the edges are added and the node positions still remain the same.