Semi Circle Donut Chart with Altair - altair

I'm new to Altair. Could you help me to figure out how to plot something like this in Python?

This seems to work for me.
import pandas as pd
import altair as alt
source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})
pie = alt.Chart(source).mark_arc(innerRadius=75).encode(
theta=alt.Theta(field="value", type="quantitative", stack=True, scale=alt.Scale(type="linear",rangeMax=1.5708, rangeMin=-1.5708 )),
color=alt.Color(field="category", type="nominal"),
)
pie + pie.mark_text(radius=170, fontSize=16).encode(text='category')

I don't think it is currently possible to combine coloring per category with a half pie/donut chart. You can combine it with a full chart, or have a half chart of a single color:
import pandas as pd
import altair as alt
source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})
pie = alt.Chart(source).mark_arc(innerRadius=75).encode(
theta=alt.Theta(field="value", type="quantitative", stack=True),
color=alt.Color(field="category", type="nominal"),
)
pie + pie.mark_text(radius=170, fontSize=16).encode(text='category')
from math import pi
import pandas as pd
import altair as alt
source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})
alt.Chart(source).mark_arc(innerRadius=75, theta=pi/2, theta2=-pi/2)

Related

What's CellArray.GetNumberOfCells()?

I create a CellArray but no matter the input of it, it's GetNumberOfCells() is always 3.
Why the result is 3?
Can I get the real number of cells?
Here is the test code.
import vtk
import numpy as np
from vtk.util.numpy_support import numpy_to_vtkIdTypeArray
def calc_num_cells(cell_ids):
cell_ids = np.concatenate(cell_ids)
cell_array = vtk.vtkCellArray()
cell_array.SetCells(vtk.VTK_LINE,
numpy_to_vtkIdTypeArray(cell_ids))
print(cell_array.GetNumberOfCells())
calc_num_cells(
[
[4, 0, 1, 2, 3],
[2, 4, 5],
[2, 6, 7],
[2, 8, 9],
[2, 10, 11],
]
)
# output: 3
calc_num_cells(
[
[4, 0, 1, 2, 3],
]
)
# output: 3
I just use SetCells in wrong way.
The first param of SetCells should be the num_cells.

How to convert a grouped pandas dataframe into a numpy 3d array and apply right-padding?

In order to feed data into a LSTM network to predict remaining-useful-life (RUL) I need to create a 3D numpy array (No of machines, No of sequences, No of variables).
I already tried to combine solutions from stackoverflow and managed to create a prototype (which you can see below).
import numpy as np
import tensorflow as tf
import pandas as pd
df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 3, 3],
'V1': [1, 2, 2, 3, 3, 4, 2],
'V2': [4, 2, 3, 2, 1, 5, 1],
})
df_desired_result = np.array([[[1, 4], [2, 2], [-99, -99]],
[[2, 3], [-99, -99], [-99, -99]],
[[3, 2], [3, 1], [4, 5]]])
max_len = df['ID'].value_counts().max()
def pad_df(df, cols, max_seq, group_col= 'ID'):
array_for_pad = np.array(list(df[cols].groupby(df[group_col]).apply(pd.DataFrame.as_matrix)))
padded_array = tf.keras.preprocessing.sequence.pad_sequences(array_for_pad,
padding='post',
maxlen=max_seq,
value=-99
)
return padded_array
#testing prototype
pad_df(df, ['V1', 'V2'], max_len)
But when I apply the code above to my data, it applies the right-padding correctly but all values are set to 0.0.
I can't fully figure out this behaviour, I noticed that in the first line of my function, I get returned an array with nested arrays for 'array_for_pad'.
Here is a screenshot of the result:
result padding

How do I get the x and y labels to appear when displaying more then one histogram using pandas hist() function with the by argument?

I am trying to create a series of graphs that share x and y labels. I can get the graphs to each have a label (explained well here!), but this is not what I am looking for.
I want one label that covers the y axis of both graphs, and same for the x axis.
I've been looking at the matplotlib and pandas documentation and I was unable to find anything that addresses this issues when the using by argument.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], sharey=True, sharex=True)
plt.ylabel('ylabel') # I assume the label is created on the 4th graph and then deleted?
plt.xlabel('xlabel') # Creates a label on the 4th graph.
plt.tight_layout()
plt.show()
The ouput looks like this.
Is there any way that I can create a Y Label that goes across the entire left side of the image (not each graph individually) and the same for the X Label.
As you can see, the x label only appears on the last graph created, and there is no y label.
Help?
This is one way to do it indirectly using the x- and y-labels as texts. I am not aware of a direct way using plt.xlabel or plt.ylabel. When passing an axis object to df.hist, the sharex and sharey arguments have to be passed in plt.subplots(). Here you can manually control/specify the position where you want to put the labels. For example, if you think the x-label is too close to the ticks, you can use 0.5, -0.02, 'X-label' to shift it slightly below.
import matplotlib.pyplot as plt
import pandas as pd
f, ax = plt.subplots(2, 2, figsize=(8, 6), sharex=True, sharey=True)
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], ax=ax)
f.text(0, 0.5, 'Y-label', ha='center', va='center', fontsize=20, rotation='vertical')
f.text(0.5, 0, 'X-label', ha='center', va='center', fontsize=20)
plt.tight_layout()
I fixed the issue with the variable number of sub-plots using something like this:
cols = 3
n = len(set(df['A']))
rows = int(n / cols) + (0 if n % cols == 0 else 1)
fig, axes = plt.subplots(rows, cols)
extra = rows * cols - n
if extra:
newaxes = []
count = 0
for row in range(rows):
for col in range(cols):
if count < n:
newaxes.append(axes[row][col])
else:
axes[row][col].axis('off')
count += 1
else:
newaxes = axes
hist = df.hist(by=df['A'], ax=newaxes)

How to get indices of a specific number in an array?

I want to pick the indices of number 8 without knowing its position in the array.
a = np.arange(10)
You can use np.where like :
>>> import numpy as np
>>> a = np.array([1,4,8,2,6,7,9,8,7,8,8,9,1,0])
>>> a
array([1, 4, 8, 2, 6, 7, 9, 8, 7, 8, 8, 9, 1, 0])
>>> np.where(a==8)[0]
array([ 2, 7, 9, 10], dtype=int64)

Matplotlib: plot the entire column values in pandas

I have the following data frame my_df:
my_1 my_2 my_3
--------------------------------
0 5 7 4
1 3 5 13
2 1 2 8
3 12 9 9
4 6 1 2
I want to make a plot where x-axis is categorical values with my_1, my_2, and my_3. y-axis is integer. For each column in my_df, I want to plot all its 5 values at x = my_i. What kind of plot should I use in matplotlib? Thanks!
You could make a bar chart:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
df.T.plot(kind='bar')
plt.show()
or a scatter plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
fig, ax = plt.subplots()
cols = np.arange(len(df.columns))
x = np.repeat(cols, len(df))
y = df.values.ravel(order='F')
color = np.tile(np.arange(len(df)), len(df.columns))
scatter = ax.scatter(x, y, s=150, c=color)
ax.set_xticks(cols)
ax.set_xticklabels(df.columns)
cbar = plt.colorbar(scatter)
cbar.set_ticks(np.arange(len(df)))
plt.show()
Just for fun, here is how to make the same scatter plot using Pandas' df.plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'my_1': [5, 3, 1, 12, 6], 'my_2': [7, 5, 2, 9, 1], 'my_3': [4, 13, 8, 9, 2]})
columns = df.columns
index = df.index
df = df.stack()
df.index.names = ['color', 'column']
df = df.rename('y').reset_index()
df['x'] = pd.Categorical(df['column']).codes
ax = df.plot(kind='scatter', x='x', y='y', c='color', colorbar=True,
cmap='viridis', s=150)
ax.set_xticks(np.arange(len(columns)))
ax.set_xticklabels(columns)
cbar = ax.collections[-1].colorbar
cbar.set_ticks(index)
plt.show()
Unfortunately, it requires quite a bit of DataFrame manipulation just to call
df.plot and then there are some extra matplotlib calls needed to set the tick
marks on the scatter plot and colorbar. Since Pandas is not saving effort here,
I would go with the first (NumPy/matplotlib) approach shown above.

Resources