How to display data from two columns in chartify heatmap? - chartify

Using the example from the documentation, the heatmap is built and displays the total_price in each cell. I want to add data from another column, e.g. 'fruit' to be displayed below the total_price in each cell. How do I do that?
Adding screenshot of where, ideally, the data would be displayed:
import chartify
# Generate example data
data = chartify.examples.example_data()
average_price_by_fruit_and_country = (data.groupby(
['fruit', 'country'])['total_price'].mean().reset_index())
# Plot the data
(chartify.Chart(
blank_labels=True,
x_axis_type='categorical',
y_axis_type='categorical')
.plot.heatmap(
data_frame=average_price_by_fruit_and_country,
x_column='fruit',
y_column='country',
color_column='total_price',
text_column='total_price',
text_color='white')
.axes.set_xaxis_label('Fruit')
.axes.set_yaxis_label('Country')
.set_title('Heatmap')
.set_subtitle("Plot numeric value grouped by two categorical values")
.show('png'))

Unfortunately there's not an easy solution at the moment, but I'll add an issue to make it easier to solve for this use case in the future.
You can access the Bokeh figure from ch.figure then use bokeh's text plot to achieve what you're looking for. Take a look at the source code for an example here. https://github.com/spotify/chartify/blob/master/chartify/_core/plot.py#L26

Related

Grouping Rows by variables in KableExtra

this is my first question here so I am uncertain on how to word things. However, I am looking into the kableExtra package for creating different tables than the ones I currently know in gt. This is my table output from gt.
enter image description here
Now what I am trying to do is group my kable table in a similar way to this. This data set is exactly what you see, with the addition of a column I have named "x" that includes Prevalence, Abundance, and Intensity for each different row. Is there a way to have a similar output for a kable table? The difficulty that I am having is because this table in gt is so lengthy it doesn't fit very well in a document. Thank you for any help.

Alternatives to interpolate three dimensional data

I have a table that shows me a chemical concentration value based on temperature, pH and
ammonia. The way the I measure these variables, the ammonia level are always one of these six values (on top of the table), so it works as a categorical variable.
I need a way to interpolate on this table, based on these 3 variables. I tried using a combination of INDEX and MATCH, but I was not able to achieve what I wanted. Then I thought of "dividing" the table in intervals to "reduce" one variable and use an IF function to select which interval to interpolate based on the third variable (I was thinking pH or Ammonia), but I can't figure out a way to change intervals dynamically like this.
Can anyone think of an alternative to accomplish what I'm trying to do? If possible I would like to avoid using VBA, but if there is no other way I have no problem using it.
Thank you for the help!
I'm attaching an example of the table below.
Assuming that PH is in Column A:
=INDEX(A:H;MATCH(6,8;A:A;0)+MATCH(25;B:B;0)-2;MATCH(2;2:2,0))
Where the -2 needs to be changed to the number of rows BEFORE the first 22 in Temp.
This also assumes that the pattern of 22;25;28 in Temp is the same for every pH

Fails to display certain columns data in Matplotlib

Given a dataframe as follows:
date,unit_value,unit_value_cumulative,daily_growth_rate
2019/1/29,1.0139,1.0139,0.22
2019/1/30,1.0057,1.0057,-0.81
2019/1/31,1.0122,1.0122,0.65
2019/2/1,1.0286,1.0286,1.62
2019/2/11,1.0446,1.0446,1.56
2019/2/12,1.0511,1.0511,0.62
2019/2/13,1.0757,1.0757,2.34
2019/2/14,1.0763,1.0763,0.06
2019/2/15,1.0554,1.0554,-1.94
2019/2/18,1.0949,1.0949,3.74
2019/2/19,1.0958,1.0958,0.08
I have used the code below to plot them, but as you can see from out image, one column doesn't display on the plot.
df.plot(x='date', y=['unit_value', 'unit_value_cumulative', 'daily_growth_rate'], kind="line")
Output:
To plot unit_value only, I use: df.plot(x='date', y=['unit_value'], kind="line")
Out:
Anyone could help to figure out why it doesn't work out when I plot three columns on same plot? Thanks.
I just reproduced your results and it actually does work fine. In your case the values of the columns "unit_value" and "unit_value_cumulative" are identical, which is why you only see the one in the front.
Besides of this problem your current data looks like you made a mistake when calculating the cumulative values.

Giving custom variable to `hue` in sns.pairplot (Seaborn)

I have the air quality(link here) dataset that contains missing values. I've imputed them while creating a dummy dataframe[using df.isnull()] to keep track of the missing values.
My goal is to generate a pairplot using seaborn(or otherwise - if any other simpler method exists) that gives a different color for the imputed values.
This is easily possible in matplotlib, where the parameter c of plt.plot can be assigned a list of values and the points are colored(but the problem is I can plot only against two columns and not a pairplot). A possible solution is to iteratively to create subplots against pairs of columns(which can make the code quite complicated!!)
However, in Seaborn (which already has the builtin function for pairplot) you are supposed to provide hue='column-name' which is not possible in this case as the missingness is stored in the dummy dataframe and need to retrieve the corresponding columns for color coding.
Please let me know how I can accomplish this in the simplest manner possible.

Excel multiple lookup array

I have a table that I would like it to select the smallest size picture frame that could be used based on the size values, basically return the smallest frame that would fit the image.
So far I have a vertical array formula that can select the smallest frame that will fit the size requirements but I have one column that I would want to stay static i.e another match that would only give the results from the selection with the same type ID/
My current formula is as follows:
= INDEX($A$2:$A$16,MATCH(4,MMULT((I2:L2<=$B$2:$E$16)+0,{1;1;1;1}),0))
At the minute i am just referencing the type as another lookup but i would like to have it so it will only attempt to match ones with the corresponding type, currently if the size is larger than availible within the correct type it will select a type that has that size availible.
I’ve tried to show what i mean in the screenshot! I want it to only pick up type 1 but it is selecting type 3 because the mmult is seeing that is the only one that would fit.
Help is much appreciated!
Thanks!
If the frame sizes to be looked up are in ascending order, you could use something like this
=INDEX($A$2:$A$4,MIN(IF((($B$2:$B$4>=F2)*($C$2:$C$4>=G2)*($D$2:$D$4>=H2)),ROW($C$2:$C$4)))-1,1)
based on this sort of data layout
Ended up using a load of nested if statements to section of the types to make it simpler to code
Thanks anyway peeps!

Resources