Fails to display certain columns data in Matplotlib - python-3.x

Given a dataframe as follows:
date,unit_value,unit_value_cumulative,daily_growth_rate
2019/1/29,1.0139,1.0139,0.22
2019/1/30,1.0057,1.0057,-0.81
2019/1/31,1.0122,1.0122,0.65
2019/2/1,1.0286,1.0286,1.62
2019/2/11,1.0446,1.0446,1.56
2019/2/12,1.0511,1.0511,0.62
2019/2/13,1.0757,1.0757,2.34
2019/2/14,1.0763,1.0763,0.06
2019/2/15,1.0554,1.0554,-1.94
2019/2/18,1.0949,1.0949,3.74
2019/2/19,1.0958,1.0958,0.08
I have used the code below to plot them, but as you can see from out image, one column doesn't display on the plot.
df.plot(x='date', y=['unit_value', 'unit_value_cumulative', 'daily_growth_rate'], kind="line")
Output:
To plot unit_value only, I use: df.plot(x='date', y=['unit_value'], kind="line")
Out:
Anyone could help to figure out why it doesn't work out when I plot three columns on same plot? Thanks.

I just reproduced your results and it actually does work fine. In your case the values of the columns "unit_value" and "unit_value_cumulative" are identical, which is why you only see the one in the front.
Besides of this problem your current data looks like you made a mistake when calculating the cumulative values.

Related

Plotting the string value frequency in python

So I have this data frame related to species of spiders and I wanted to see what are the top 10 highest occurring family of spiders. So I used the below code to find it out:
n=10
dfc['family'].value_counts()[:n].index.tolist()
I want to create a plot which will show how many of each of those top 10 species exists in the data frame. That is, I want a plot that says 300 of the first species exist and 200 of the second species exist in the data frame, just like this. But I cannot quite figure out the code for this.
Can anyone help me out with it?
Not knowing what your dataframe looks like at all, it is a little tough to give a precise answer, and I didn't check the below code on a dataframe because I didn't have something handy. (also, I assume you are using pandas here).
It sounds like you want (or at least could use) a dataframe that has a column of families and then the next column is just the count of that family in the original. You can accomplish this with groupby().
df2 = dfc.groupby(['family']).count()
If you want to then just have the top 10 left on there to make it easy to plot, you can use the nlargest() function in pandas.
df2 = df2.nlargest(10,'family')

Unwanted spacing in a barplot

Inside my notebook I am reading data from a sqlite database using pandas.
The data is store in consecutive order, meaning there is an entry for each day (no gaps). And this is how a single entry looks like in the database:
Now when I try to plot this to a barplot (sns = seaborn) I get some strange gaps between data which seems to be grouped somehow:
data['timestamp'] = data['timestamp'].dt.date
sns.barplot(x='timestamp', y='steps', data=data, ci=None)
I have been using the same datetime format for other plots and it worked fine, so I rule that out to be the cause.
Please help me understand why those gaps occur in my plot. I would have expected the plot would look something like this (please ignore the colors):

Giving custom variable to `hue` in sns.pairplot (Seaborn)

I have the air quality(link here) dataset that contains missing values. I've imputed them while creating a dummy dataframe[using df.isnull()] to keep track of the missing values.
My goal is to generate a pairplot using seaborn(or otherwise - if any other simpler method exists) that gives a different color for the imputed values.
This is easily possible in matplotlib, where the parameter c of plt.plot can be assigned a list of values and the points are colored(but the problem is I can plot only against two columns and not a pairplot). A possible solution is to iteratively to create subplots against pairs of columns(which can make the code quite complicated!!)
However, in Seaborn (which already has the builtin function for pairplot) you are supposed to provide hue='column-name' which is not possible in this case as the missingness is stored in the dummy dataframe and need to retrieve the corresponding columns for color coding.
Please let me know how I can accomplish this in the simplest manner possible.

How to display data from two columns in chartify heatmap?

Using the example from the documentation, the heatmap is built and displays the total_price in each cell. I want to add data from another column, e.g. 'fruit' to be displayed below the total_price in each cell. How do I do that?
Adding screenshot of where, ideally, the data would be displayed:
import chartify
# Generate example data
data = chartify.examples.example_data()
average_price_by_fruit_and_country = (data.groupby(
['fruit', 'country'])['total_price'].mean().reset_index())
# Plot the data
(chartify.Chart(
blank_labels=True,
x_axis_type='categorical',
y_axis_type='categorical')
.plot.heatmap(
data_frame=average_price_by_fruit_and_country,
x_column='fruit',
y_column='country',
color_column='total_price',
text_column='total_price',
text_color='white')
.axes.set_xaxis_label('Fruit')
.axes.set_yaxis_label('Country')
.set_title('Heatmap')
.set_subtitle("Plot numeric value grouped by two categorical values")
.show('png'))
Unfortunately there's not an easy solution at the moment, but I'll add an issue to make it easier to solve for this use case in the future.
You can access the Bokeh figure from ch.figure then use bokeh's text plot to achieve what you're looking for. Take a look at the source code for an example here. https://github.com/spotify/chartify/blob/master/chartify/_core/plot.py#L26

Hide wrong values of a graph

I am building graphics using Matplotlib and I sometimes have wrong values in my Csv files, it creates spikes in my graph that I would like to suppress, also sometimes I have lots of zeros ( when the sensor is disconnected ) but I would prefer the graph showing blank spaces than wrong zeros that could be interpreted as real values.
Forgive me for I'm not familiar with matplotlib but I'm presuming that you're reading the csv file directly into matplotlib. If so is there an option to read the csv file into your app as a list of ints or as a string and then do the data validation before passing that string to the library?
Apologies if my idea is not applicable.
I found a way that works:
I used the Xlim to set my max and min x values and then i set all the values that i didnt want to nan !

Resources