I have managed to plot two different data sets on the same axis however, I'm also looking to plotting another line showing their average.
The main problem is that both data sets have different X (time) values so it's not possible to add an average column at the end and plot that. (See the highlighted row 22 for example, corresponding Time values are different)
Is there any way I can plot an average of two plots on the same axis?
One idea that might work is to place the values of both series, one above the other in two new columns, sort this new data according to time, smooth it, then plot the smoothed combined data. Alternatively, you could do the smoothing by simply plotting the new sorted series, adding a moving average trendline to it, then change the formatting of the new series so that it is no longer visible (but the trendline is). Something like this:
In the above picture, series 3 is the plot of the sorted aggregate data of series 1 and 2. If you change the formatting of series 3 so that there is no line, you get something like this:
For my relatively small mock data sets, the results are admittedly poor (it was based on just 25 data points in each series), but if you have a large amount of closely spaced data, and you play around with the moving average window size, you might get something acceptable. If not, you should probably just interpolate both datasets to obtain two consistent time series.
Related
I am trying to count and plot the number of data points I have for each area by day, so far I have:
But I would like to show the number of instances of each county per day, with the end goal of plotting them on a line graph, like:
Only I would want to plot each county on its own line, rather than the total which I have plotted above.
Update:
I have managed to get this from the answers provided:
Which is great and exactly what I was looking for. However, in hindsight, this looks a little messy and not very descriptive even for the short period plotted let alone if I were to plot this for a couple of years worth of data.
So I'm thinking to plot this indivually on an 8 grid plot. But when I try to plot this for one county I am getting the boolean values. As below:
What would be the best way to plot only the True values?
You can try
df.county.groupby([df.date_stamp, df.county]).count().unstack().plot();
df.county...count() is the numerical series you want to plot.
groupby([df.date_stamp, df.county]) groups first by date_stamp, then by country (the order matters).
unstack will create a Dataframe whose index is the time stamp, and columns are counties.
plot(); will plot it (and the ; suppresses the unnecessary output).
Edit
To plot it on separate plots, you could do something like
for county in df.county.unique():
this_county = df[df.county == county]
this_county.county.groupby(df.date_stamp).count().plot();
title(county);
show();
pd.crosstab(df['date_stamp'],df['county']).plot()
EDIT: question changed, if you want them in subplots instead of lines:
pd.crosstab(df['date_stamp'],df['county']).plot(subplots=True)
The key in drawing each county as a separate line is that each county needs to be in a different column. If you just want to count them, crosstab is then probably the shortest way to achieve that result. For example:
Then the result is:
When subplots=True:
I'm plotting a column of data which represents a time series in gnuplot. Every value represents a time value after 500 iterations / time units. Can I tell gnuplot to multiply the x-values it displays by 500?
I thought this would be a standard problem since every time one has to plot a time series one needs to tell the plotting program what time unit each iteration has.
I don't want to create an extra column with x-values manually, since I have a lot of different data of different length. I don't want to create a x column for everyone of them.
If you have only a single column, gnuplot uses the row number as x value. This can be accessed by the pseudo column 0 and scaled like
plot 'datafile' using ($0*500):1
or equivalently, if you're calling this from a shell script
plot 'datafile' using (column(0)*500):1
My problem is as follows:
The user inputs two numbers between 2 and 25, these numbers are used to create a grid. Every point on the grid has (x,y) coordinates. Based on the amount of points the user chose, my excel sheet is filled up with up to 25x25 (x,y) coordinates.
Example: A 6x7 grid is chosen by the user, the table is filled with 42 (x,y) coordinates and all other values in the table are set to "".
Now I want to use a scatterplot with lines connecting each array to plot the data.
Problem 1: If I only select the 6x7 part of the table that has values in it and create the scatterplot the result is correct. Until the user specifies a different grid, for example 8x9, then the graph is obviously missing two rows and two columns of input data.
Problem 2: If I select the entire 25x25 part of the table, including all the "" values, the graph axes get messed up. The y-axis works properly, but the x-axis shows sequential values (0-7) instead of the x-coordinates.
Problem 3: If I replace all the "" values in the table to 0 or NaN and plot the entire table the axes are correct, but the lines between the scatter data get messed up.
Question:
Is there a way to automatically change the input data for the plot, or is there a way to correctly display the values on the x-axis if I select all the data?
Not sure this will work in your case, but it's worth a try, especially since no one's addressed your post in 3+ hours. I've had success with this approach: 1) charting the largest data set, 2) copying the resulting chart, and 3) trimming the data it draws from to produce all smaller data sets.
To get this to work takes a lot of thought in laying out that largest data set so that all the other plots follow as needed. To illustrate, I've somewhat mimicked your data and in the animated gif I show largest data set, plus 2 others produced by copying it. Then I demonstrate how to make the second one, including the rescaling required to make all plots scaled equally. Notice that I've arranged things so that only one set of x-values feeds all the series. If you can do this, it makes working with the Excel's interface much easier.
After wrestling with it all night I came to the following solution:
Instead of setting all the empty cells to "" or zero the cells should be be set to #N/A (not available). The graph properly ignores the #N/A cells exactly like I want it to and updates when values are entered into them.
I have a two sets of data that I wish to plot on the same chart in Excel 2013.
The first data set is time series data and has about 100 daily observations. I would like to plot this as a line chart.
The second data set only has 6 data points & I wish to plot these as a column. Is this possible when the number of observation in each data set are different?
I know it can be done if you have the same number of observations in both data sets.
You will make things easier if you give them the same categories and use blanks to skip missing entries. Excel is not very smart about matching categories between different sets of data unless you are using scatter plots.
I have 2 data series, which records how much a user is meditating/attentive (out of 100) plotted onto a graph. The x axis is the number of seconds since the start of the experiment, and the y axis shows the value for meditation/attention at that point of time.
I have a 3rd set of data that is a series of key timestamps during the experiment (not exactly matching the timestamps from attention/meditation values).
I want to create a graph where you can compare how the attention/meditation values change at the key points
Whether the key points are highlighted by a line or dots I don't care. I tried adding the 3rd data set as a secondary axis, but it still uses the original x-axis of the main graph and I don't know how to make excel do what I want.
Thanks in advance
You should use an XY Scatter chart, not a line chart. A line chart ignores any numerical value in the X values, treats each X value as a text label, and uses the X values from the first series as X values for all series.
You can format the first two series so that they use lines and not markers, and the third so it uses markers without lines.
You may find this link helpful: superuser.com/questions/825692 You don't need to use the secondary axis, just add another series with tag times and constant 45 value, then format vertical error bars to 100% and horizontal to 0%.