The legend on my plot shows various entries; however, I can only see one dataset actually plotted in the graph. How can I show the rest?
The code that I am using is the following:
HPI_data = pd.read_pickle('pandas_pickle.pickle')
print(HPI_data.head())
#note: all data points are being printed correctly up to this point so I do not think that there is anything nothing wrong with the pickling.
HPI_data.plot()
plt.legend().remove()
plt.show()
# the plot only shows only dataset for some reason!
The following shows the actual output of print(HPI_data.head()):
House pricing index raw data
The output displayed is exactly how I want to be; however, the issue I am dealing with is with the plotting.
Related
Ok, this is a bit of a funky idea, but is it possible to generate a dataset of points that, when plotted, display a desired text? I remember seeing this somewhere, but have no idea how to do it.
I'm trying to generate plot for each department and security rating, but all get is just one plot with latest values retrieved from dictionary holder:
my code is here:
https://codeshare.io/aYleWY
how to get:
graph
graph
graph
... but not as a subplot?
EDIT:
Corrected the code, and corrected the bug which repeated same data everywhere, but my plots lost Y axis altogether, any idea why?
You may add the statement plt.figure() in the beginning of your last for-loop in order to create a new figure on each iteration. Good luck!
I am trying to count and plot the number of data points I have for each area by day, so far I have:
But I would like to show the number of instances of each county per day, with the end goal of plotting them on a line graph, like:
Only I would want to plot each county on its own line, rather than the total which I have plotted above.
Update:
I have managed to get this from the answers provided:
Which is great and exactly what I was looking for. However, in hindsight, this looks a little messy and not very descriptive even for the short period plotted let alone if I were to plot this for a couple of years worth of data.
So I'm thinking to plot this indivually on an 8 grid plot. But when I try to plot this for one county I am getting the boolean values. As below:
What would be the best way to plot only the True values?
You can try
df.county.groupby([df.date_stamp, df.county]).count().unstack().plot();
df.county...count() is the numerical series you want to plot.
groupby([df.date_stamp, df.county]) groups first by date_stamp, then by country (the order matters).
unstack will create a Dataframe whose index is the time stamp, and columns are counties.
plot(); will plot it (and the ; suppresses the unnecessary output).
Edit
To plot it on separate plots, you could do something like
for county in df.county.unique():
this_county = df[df.county == county]
this_county.county.groupby(df.date_stamp).count().plot();
title(county);
show();
pd.crosstab(df['date_stamp'],df['county']).plot()
EDIT: question changed, if you want them in subplots instead of lines:
pd.crosstab(df['date_stamp'],df['county']).plot(subplots=True)
The key in drawing each county as a separate line is that each county needs to be in a different column. If you just want to count them, crosstab is then probably the shortest way to achieve that result. For example:
Then the result is:
When subplots=True:
My problem is as follows:
The user inputs two numbers between 2 and 25, these numbers are used to create a grid. Every point on the grid has (x,y) coordinates. Based on the amount of points the user chose, my excel sheet is filled up with up to 25x25 (x,y) coordinates.
Example: A 6x7 grid is chosen by the user, the table is filled with 42 (x,y) coordinates and all other values in the table are set to "".
Now I want to use a scatterplot with lines connecting each array to plot the data.
Problem 1: If I only select the 6x7 part of the table that has values in it and create the scatterplot the result is correct. Until the user specifies a different grid, for example 8x9, then the graph is obviously missing two rows and two columns of input data.
Problem 2: If I select the entire 25x25 part of the table, including all the "" values, the graph axes get messed up. The y-axis works properly, but the x-axis shows sequential values (0-7) instead of the x-coordinates.
Problem 3: If I replace all the "" values in the table to 0 or NaN and plot the entire table the axes are correct, but the lines between the scatter data get messed up.
Question:
Is there a way to automatically change the input data for the plot, or is there a way to correctly display the values on the x-axis if I select all the data?
Not sure this will work in your case, but it's worth a try, especially since no one's addressed your post in 3+ hours. I've had success with this approach: 1) charting the largest data set, 2) copying the resulting chart, and 3) trimming the data it draws from to produce all smaller data sets.
To get this to work takes a lot of thought in laying out that largest data set so that all the other plots follow as needed. To illustrate, I've somewhat mimicked your data and in the animated gif I show largest data set, plus 2 others produced by copying it. Then I demonstrate how to make the second one, including the rescaling required to make all plots scaled equally. Notice that I've arranged things so that only one set of x-values feeds all the series. If you can do this, it makes working with the Excel's interface much easier.
After wrestling with it all night I came to the following solution:
Instead of setting all the empty cells to "" or zero the cells should be be set to #N/A (not available). The graph properly ignores the #N/A cells exactly like I want it to and updates when values are entered into them.
I need to get Excel to graph about 5 points into a single curve. I'm looking for is the average of all the point if the points don't line up perfectly. I'm not very familiar with graphing on Excel, so it's very possible I've overlooked some option to get what I'm trying to achieve.
This is an example of what I'm looking for:
It looks liek what you're looking to do is add a trendline to a scatterplot.
I've included some screenshots below.
Inital graph showing 3 groups of data we want to 'average':
Adding trendline to the points:
Trendline settings:
Final result: