Domain of the axes does not exactly defines how chart is rendered - altair

While specifying domain starting at zero:
alt.Scale(domain=(0, 1000))
I still obtain a plot with negative values on the X axis:
I don't understand why is it behaving like this? And how to force it starting always exactly at the value, provided in the domain?
Code for plotting:
data=pd.DataFrame({'foo': {0: 250,
1: 260,
2: 270,
3: 280,
},
'cnt': {0: 6306,
1: 5761,
2: 5286,
3: 4785,
}})
alt.Chart(data).mark_bar().encode(
alt.X(
'foo',
scale=alt.Scale(domain=(0, 1000))
),
alt.Y("cnt")
Lib version:
altair 3.2.0

For bar marks, Vega-Lite automatically adds a padding to domains (this is not the case for other mark types). The fact that it does this even when the user explicitly specifies the domain is a bug; see vega/vega-lite#5295.
As a workaround until this bug is fixed, you can turn this behavior offby setting padding=0:
import altair as alt
import pandas as pd
data=pd.DataFrame({
'foo': [250, 260, 270, 280],
'cnt': [6306, 5761, 5286, 4785]
})
alt.Chart(data).mark_bar().encode(
alt.X(
'foo',
scale=alt.Scale(domain=(0, 1000), padding=0)
),
alt.Y("cnt")
)

Related

pandas: draw plot using dict and labels on top of each bar

I am trying to plot a graph from a dict, which works fine but I also have a similar dict with values that I intend to write on top of each bar.
This works fine for plotting the graph:
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['axes.formatter.useoffset'] = False
df = pd.DataFrame([population_dct])
df.sum().sort_values(ascending=False).plot.bar(color='b')
plt.savefig("temp_fig.png")
Where the population_dct is:
{'pak': 210, 'afg': 182, 'ban': 94, 'ind': 32, 'aus': 14, 'usa': 345, 'nz': 571, 'col': 47, 'iran': 2}
Now I have another dict, called counter_dct:
{'pak': 1.12134, 'afg': 32.4522, 'ban': 3.44, 'ind': 1.123, 'aus': 4.22, 'usa': 9.44343, 'nz': 57.12121, 'col': 2.447, 'iran': 27.5}
I need the second dict items to be shown on top of each bar from the previous graph.
What I tried:
df = pd.DataFrame([population_dct])
df.sum().sort_values(ascending=False).plot.bar(color='g')
for i, v in enumerate(counter_dct.values()):
plt.text(v, i, " " + str(v), color='blue', va='center', fontweight='bold')
This has two issues:
counter_dct.values() msesses up with the sequence of values
The values are shown at the bottom of each graph with poor alignment
Perhaps there's a better way to achieve this?
Since you are drawing the graph in a desc manner;
You need to first sort the population_dict in a desc manner based on values
temp_dct = dict(sorted(population_dct.items(), key=lambda x: x[1], reverse=True))
Start with the temp_dct and then get the value from the counter_dct
counter = 0 # to start from the x-axis
for key, val in temp_dct.items():
top_val = counter_dct[key]
plt.text(x=counter, y=val + 2, s=f"{top_val}", fontdict=dict(fontsize=11))
counter += 1
plt.xticks(rotation=45, ha='right')

global change to the universal default color of plotly #444

Currently, the default color of traces, borders, ticks, outlines, etc, as noted here, is #444. Has anyone found a way to change this default setting instead of specifying each and every single feature one would like to change?
You can start defining your template you want. Then you should add template=template to all plots you want to apply these settings:
import plotly.express as px
import plotly.graph_objs as go
# Create the template
# Put this block in the top of your file
template = go.layout.Template(
layout=go.Layout(
paper_bgcolor="#333333",
plot_bgcolor="#333333"
)
)
fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16], template=template)
fig.show()

How to create conditional plots of groupby objects using matplotlib/seaborn?

I have data from a University where each entry is a student with the columns (first name, last name, major, sex, etc.)
I have created an aggregation of counts of male and females in each major:
gender_counts = (only_science.groupby(['no_concentration', 'sex'], as_index=False)
.size()
.unstack(fill_value=0)
.sort_values('Female', ascending=False)
)
Output:
DataFrame
Here is the plot that I created:
ax3 = gender_counts.plot(kind='bar', title='Gender Breakdown by Major')
ax3.set_xlabel("CoS Majors")
ax3.set_ylabel("Number of Applicants")
plt.show()
Output: Majors Plot by Gender
Goal: Create individual graphs of each major using the aggregated data so that the scale can be more meaningful and not be skewed by Biological Sciences.
I have tried to use sns.FacetGrid() and FacetGrid.map() and also tried sns.catplot() but I'm not sure what use for the parameters, and get a plethora of errors.
If I can create a bar chart for one of the majors then I can just create a for loop to iterate over gender_counts and make all of the bar charts.
Thank you for your help and I apologize if there are elements missing from this question. This is my first stack overflow question.
You can use sns.catplot with sharey=False:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'no_concentration': {0: 'Biological Sciences',1: 'Pre-Medicine',2: 'Biochemistry',3: 'Pre-Dentistry',4: 'Chemistry',
5: 'Mathematics',6: 'Physics',7: 'Microbiology',8: 'Geology',9: 'Biological Sciences',10: 'Pre-Medicine',
11: 'Biochemistry',12: 'Pre-Dentistry',13: 'Chemistry',14: 'Mathematics',15: 'Physics',16: 'Microbiology',17: 'Geology'},
'Sex': {0: 'Female',1: 'Female',2: 'Female',3: 'Female',4: 'Female',5: 'Female',6: 'Female',7: 'Female',8: 'Female',9: 'Male',10:
'Male',11: 'Male',12: 'Male',13: 'Male',14: 'Male',15: 'Male', 16: 'Male',17: 'Male'},
'value': {0: 1282,1: 1267, 2: 291, 3: 187, 4: 175, 5: 89, 6: 75, 7: 57,8: 18,9: 534,10: 445,11: 122,12: 76,13: 80,14: 76,15: 118,16: 29,17: 31}})
sns.set_context('paper', font_scale=1.4)
sns.catplot(data=df, x='Sex', y='value', col='no_concentration', kind='bar',
col_wrap=3, palette=sns.color_palette("icefire"), sharey=False)
plt.figure(figsize=(16, 8))
plt.style.use('dark_background') #I use dark mode in jupyter notebook, so I need to use this line, but you can omit.
plt.show()

My Bar Plot is not showing bars for all the data values

I have a DataFrame that contains two features namely LotFrontage and LotArea.
I want to plot a bar graph to show the relation between them.
My code is:
import matplotlib.pyplot as plt
visual_df=pd.DataFrame()
visual_df['area']=df_encoded['LotArea']
visual_df['frontage']=df_encoded['LotFrontage']
visual_df.dropna(inplace=True)
plt.figure(figsize=(15,10))
plt.bar(visual_df['area'],visual_df['frontage'])
plt.show()
The column LotFrontage is in Float datatype.
What is wrong with my code and How can I correct it?
To see a relationship between two features, a scatter plot is usually much more informative than a bar plot. To draw a scatter plot via matplotlib: plt.scatter(visual_df['area'], visual_df['frontage']). You can also invoke pandas scatter plot, which automatically adds axis labels: df.plot(kind='scatter', x='area', y='frontage').
For a lot of statistical purposes, seaborn can be handy. sns.regplot not only creates the scatter plot but automatically also tries to fit the data with a linear regression and shows a confidence interval.
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
area = [8450, 9600, 11250, 9550, 14260, 14115, 10084, 6120, 7420, 11200, 11924, 10652, 6120, 10791, 13695, 7560, 14215, 7449, 9742, 4224, 14230, 7200]
frontage = [65, 80, 68, 60, 84, 85, 75, 51, 50, 70, 85, 91, 51, 72, 68, 70, 101, 57, 75, 44, 110, 60]
df = pd.DataFrame({'area': area, 'frontage': frontage})
sns.regplot(x='area', y='frontage', data=df)
plt.show()
PS: The main problem with the intented bar plot is that the x-values lie very far apart. Moreover, the default width is one and very narrow bars can get too narrow to see in the plot. Adding an explicit edge color can make them visible:
plt.bar(visual_df['area'], visual_df['frontage'], ec='blue')
You could set a larger width, but then some bars would start to overlap.
Alternatively, pandas barplot would treat the x-axis as categorical, showing all x-values next to each other, as if they were strings. The bars are drawn in the order of the dataframe, so you might want to sort first:
df.sort_values('area').plot(kind='bar', x='area', y='frontage')
plt.tight_layout()

Python Bokeh donut chart category, subcategory mean calculation

I am using the following code to draw a bokeh donut chart to visualize the mean prices across different categories and subcategories.
d = Donut(train.groupby(['main_cat','sub_cat']).price.mean(), hover_text='mean',width=500,height=500)
show(d)
For sub_cat, the values are calculated correctly, but for main_cat, instead of showing the mean for main_cat, it is showing the sum of means of sub_cat under the particular main_cat. What change either in bokeh code or python code should be made to correctly show the mean values main_cat?
Your support is highly appreciated.
There probably is not a way. Donut was part of the old bokeh.charts API that was deprecated and sub sequently removed from Bokeh last year. In particular, any problems, issues, or missing features will never receive any additional work. It is abandoned and unmaintained, and should not be used. If you want to use Bokeh to display Donut charts, you can use the annular_wedge glyph to diplauy to donut pieces explicity:
from math import pi
import pandas as pd
from bokeh.io import output_file, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum
x = { 'United States': 157, 'United Kingdom': 93, 'Japan': 89, 'China': 63,
'Germany': 44, 'India': 42, 'Italy': 40, 'Australia': 35,
'Brazil': 32, 'France': 31, 'Taiwan': 31, 'Spain': 29 }
data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[len(x)]
p = figure(plot_height=350)
p.annular_wedge(x=0, y=1, inner_radius=0.2, outer_radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend='country', source=data)
show(p)

Resources