Unwanted spacing in a barplot - python-3.x

Inside my notebook I am reading data from a sqlite database using pandas.
The data is store in consecutive order, meaning there is an entry for each day (no gaps). And this is how a single entry looks like in the database:
Now when I try to plot this to a barplot (sns = seaborn) I get some strange gaps between data which seems to be grouped somehow:
data['timestamp'] = data['timestamp'].dt.date
sns.barplot(x='timestamp', y='steps', data=data, ci=None)
I have been using the same datetime format for other plots and it worked fine, so I rule that out to be the cause.
Please help me understand why those gaps occur in my plot. I would have expected the plot would look something like this (please ignore the colors):

Related

Plotting the string value frequency in python

So I have this data frame related to species of spiders and I wanted to see what are the top 10 highest occurring family of spiders. So I used the below code to find it out:
n=10
dfc['family'].value_counts()[:n].index.tolist()
I want to create a plot which will show how many of each of those top 10 species exists in the data frame. That is, I want a plot that says 300 of the first species exist and 200 of the second species exist in the data frame, just like this. But I cannot quite figure out the code for this.
Can anyone help me out with it?
Not knowing what your dataframe looks like at all, it is a little tough to give a precise answer, and I didn't check the below code on a dataframe because I didn't have something handy. (also, I assume you are using pandas here).
It sounds like you want (or at least could use) a dataframe that has a column of families and then the next column is just the count of that family in the original. You can accomplish this with groupby().
df2 = dfc.groupby(['family']).count()
If you want to then just have the top 10 left on there to make it easy to plot, you can use the nlargest() function in pandas.
df2 = df2.nlargest(10,'family')

Pandas DateTime Interpretation Issues & Plotting with axvline

Recently I've used pd.to_datetime to assist with formatting date-time series data into the respective format I need for a matplotlib plot. I haven't had any issues until now where for an unknown reason, when I use a axvline (Matplotlib), a date-time error is thrown. However, this is only thrown when my code is executed in a function and not run in the console. After trying for the past two days, I've opted to ask this on SO for some guidance.
The expected output is:
Where the dotted red-line indicates a singular failure date.
My function follows the logic below:
for i in result:
Asset_Num = i.strip(".csv")
dataframe = pd.read_csv(i)
df2 = dataframe[(dataframe['Asset_Number']==Asset_Num)]
#Convert the Date/Time object into a list of Time Stamp Values
fdl = pd.to_datetime(df2['Date/Time'], format="%Y-%m-%d %H:%M:%S").to_list()
ax = dataframe.plot(linewidth=2)
ax.axvline(fdl, color='r', linestyle="--")
The error that is then thrown as I run this function is:
But I am confused as both the X-axis and fdl variable, both are in the same date-time format and I have checked this numerous times.
What am I doing wrong?
I've attached some sample data as per our minimum criteria guidelines if you'd like to try recreate this. (http://www.sharecsv.com/s/292b419dc674302ac5b6a96a2da0e06e/SampleData.csv)
Thank you.

Fails to display certain columns data in Matplotlib

Given a dataframe as follows:
date,unit_value,unit_value_cumulative,daily_growth_rate
2019/1/29,1.0139,1.0139,0.22
2019/1/30,1.0057,1.0057,-0.81
2019/1/31,1.0122,1.0122,0.65
2019/2/1,1.0286,1.0286,1.62
2019/2/11,1.0446,1.0446,1.56
2019/2/12,1.0511,1.0511,0.62
2019/2/13,1.0757,1.0757,2.34
2019/2/14,1.0763,1.0763,0.06
2019/2/15,1.0554,1.0554,-1.94
2019/2/18,1.0949,1.0949,3.74
2019/2/19,1.0958,1.0958,0.08
I have used the code below to plot them, but as you can see from out image, one column doesn't display on the plot.
df.plot(x='date', y=['unit_value', 'unit_value_cumulative', 'daily_growth_rate'], kind="line")
Output:
To plot unit_value only, I use: df.plot(x='date', y=['unit_value'], kind="line")
Out:
Anyone could help to figure out why it doesn't work out when I plot three columns on same plot? Thanks.
I just reproduced your results and it actually does work fine. In your case the values of the columns "unit_value" and "unit_value_cumulative" are identical, which is why you only see the one in the front.
Besides of this problem your current data looks like you made a mistake when calculating the cumulative values.

How to display data from two columns in chartify heatmap?

Using the example from the documentation, the heatmap is built and displays the total_price in each cell. I want to add data from another column, e.g. 'fruit' to be displayed below the total_price in each cell. How do I do that?
Adding screenshot of where, ideally, the data would be displayed:
import chartify
# Generate example data
data = chartify.examples.example_data()
average_price_by_fruit_and_country = (data.groupby(
['fruit', 'country'])['total_price'].mean().reset_index())
# Plot the data
(chartify.Chart(
blank_labels=True,
x_axis_type='categorical',
y_axis_type='categorical')
.plot.heatmap(
data_frame=average_price_by_fruit_and_country,
x_column='fruit',
y_column='country',
color_column='total_price',
text_column='total_price',
text_color='white')
.axes.set_xaxis_label('Fruit')
.axes.set_yaxis_label('Country')
.set_title('Heatmap')
.set_subtitle("Plot numeric value grouped by two categorical values")
.show('png'))
Unfortunately there's not an easy solution at the moment, but I'll add an issue to make it easier to solve for this use case in the future.
You can access the Bokeh figure from ch.figure then use bokeh's text plot to achieve what you're looking for. Take a look at the source code for an example here. https://github.com/spotify/chartify/blob/master/chartify/_core/plot.py#L26

Hide wrong values of a graph

I am building graphics using Matplotlib and I sometimes have wrong values in my Csv files, it creates spikes in my graph that I would like to suppress, also sometimes I have lots of zeros ( when the sensor is disconnected ) but I would prefer the graph showing blank spaces than wrong zeros that could be interpreted as real values.
Forgive me for I'm not familiar with matplotlib but I'm presuming that you're reading the csv file directly into matplotlib. If so is there an option to read the csv file into your app as a list of ints or as a string and then do the data validation before passing that string to the library?
Apologies if my idea is not applicable.
I found a way that works:
I used the Xlim to set my max and min x values and then i set all the values that i didnt want to nan !

Resources