Creating a scatter-plot with series from row values, and XY values from two other rows - excel

I am doing a project that requires me to study the several condition that affect GPS accuracy, and after I collected a set of data and dumped it to Excel, I was trying to plot a scatter graph, grouping the data into different series according to a value: in this case, I wanted to plot the Latitude and Longitude values as the XY scatter values, and separate the series by the number of satellites when the fix was obtained.
Timestamp Latitude Longitude #Satellites
133009.279 3839.3354 904.7395 0
133010.279 3839.3354 904.7395 0
133011.279 3839.3354 904.7395 0
133026 3845.9863 907.4513 4
133027 3845.986 907.4491 4
133028 3845.9851 907.448 4
133222 3845.9909 907.4866 4
133023.28 3845.9817 907.4429 5
133024.28 3845.9867 907.4549 5
133048 3845.9868 907.452 5
133205 3845.9929 907.4858 5
133206 3845.9927 907.486 5
133207 3845.9925 907.4862 5
133056 3845.9885 907.4569 6
133057 3845.9881 907.4578 6
133223 3845.9905 907.4868 6
133224 3845.9901 907.487 6
I have tried selecting the three rows, adding the series afterwards by selecting the appropriate row, and even tried pivot tables, but these don't allow for scatter-plots unfortunately.
All this to no avail, but I am positive that you can plot the graph. Does anyone have an idea?
PS: Manually selecting the series myself isn't an option, since there is a large number of data. If I could select all of the data for one specific value in a row, though, would let my select each series, and I think I would be able to make it from there.

Have a look at the XY charts from FusionCharts XT - http://www.fusioncharts.com/demos/gallery/#bubble-and-xycharts

Related

Does of order data points in excel influence the Regression results in Excel

I tried to do a regression analysis with some 91 data points. When I did the regression analysis initially, I got R value as 0.366733. Later I sorted the datapoints from smallest to largest and then did the regression analysis. My new R value is 0.04323. Does the order in which the original data points are arranged influence the regression analysis
The ordering of paired datapoints does not matter in regression
For example:
5 9
6 1
3 7
9 5
6 4
Gives a correlation (which is the same as standardized regression) of -0.37
If I reorder the entire data based on column 1 values:
3 7
5 9
6 1
6 4
9 5
I get the same correlation of -0.37. Notice that the pairs are still aligned, i.e. both columns are being sorted together
But in Excel its very easy to get into a situation like the following, where you're sorting by only a single column. Meaning one column will be the ordered, but the pair alignment is broken because the second column doesnt change:
3 9
5 1
6 7
6 5
9 4
Now I get a correlation of -0.41. The pairs of data are no longer aligned and effectively makes this a completely different dataset than before
Bottom line: when youre sorting in Excel make sure you've selected all of your data for the sort and not just a single column

Add horizontal axis per series in excel

How frustrating is Excel.. working on this for half an hour now.
I simply try to make a frequency plot of two groups, with different colours. On the x-axis I would like to display the subject.ids per bar.
However, if I select a different range for the horizontal x axis per series (series 1 = blue, series 2 = orange) with the subject id, it changes the x-axis in the other series to the same. What in hell am i doing wrong?
3007 1
23121 1
3009 1
3005 1
3011 2
23171 2
3207 2
3102 3
3207 6
13302 7
2411 11
23191 11
3008 11
3106 12
110031 1
110031 1
110030 1
110017 1
110014 1
110008 1
110004 1
110007 2
110035 4
110020 4
110003 4
110036 10
110019 11
110015 21
AFAIK, you cannot put 2 series onto the x axis.
You have 2 alternate ways to solve your problem:
Concatenate each positional pair into a new column and use this as the x-axis label series. It will look like this:
You could use data labels for each series. However, this will add the data to the columns themselves and not the axis (you could put it at the base of the column). To do so, you will need to right click on the graph, select 'Add Data Labels'. By default it adds the value as the label, but you can select the labels, right click to format the data labels and use the 'values from cells' option. Once you do this and play around with the orientation and location of the labels, it will look like this:
For simplicity, I'd go with the first method
Adding a 3rd option; simply put the columns for the axis labels beside each other and when selecting the Data for the Axis Labels, just select both columns instead of the usual 1. It will look like this:

Groupwise Probability Distribution

I have a dataframe df of gps points. I had geographical region that I divided into grid. Each grid cell is represented by pair of two columns (row, col) in a dataframe. The GPS points are labelled with their transportation modes. I want to calculate probability distribution of each grid cell by its transportation modes. (there are five modes of transportation, i.e. walk, bike, car, train, subway).
Row Col P(Walk) P(Bike) P(Car) P(Train) P(Subway)
8 8 Freq(walk)/n Freq(bike)/n Freq(car)/n Freq(train)/n Freq(subway)/n
8 9 Freq(walk)/n Freq(bike)/n Freq(car)/n Freq(train)/n Freq(subway)/n
8 10 Freq(walk)/n Freq(bike)/n Freq(car)/n Freq(train)/n Freq(subway)/n
For example grid cell at row 8, col 8 contains 638 gps points. 598 walk points and 40 subway points Then probability of each transportation mode for this specific grid cell becomes
Row Col P(Walk) P(Bike) P(Car) P(Train) P(Subway)
8 8 598/638 0/638 0/638 0/638 40/638
8 9 ... ... ... ... ...
8 10 ... ... ... ... ...
... ... ... ... ... ... ...
'''
grp = df.groupby(['row','col','Transportation_Mode'])
One way is to iterate over each group one by one using for loops to get the frequency of each transportation mode. But I think their should be more easier or pandorizable way or library that can solve this in just few lines.
An image of geographical region is attached for better understanding of the problem where each geographical region is divided into grid cells represented by rows and cols. Each grid cell contains multiple gps points labelled with their transportation modes.
The csv file of dataframe is available in given link for more clarity of data.
https://drive.google.com/open?id=1R_BBL00G_Dlo-6yrovYJp5zEYLwlMPi9
If I'm not mistaken, you're looking for a more elegant way to loop over each group object and generate a 2-dimensional probability distribution for each one?
It sounds like you should look into this pandas documentation (more specifically the apply function).
You could simply apply a visualization to each group such as this SNS KDE visualization and then join the individual plots back into a grid like the one you provided. With a little ax magic, you can construct a grid for each transportation type. I think those are the best tools at hand to use. I'll leave the logic to you.

Plotting Different values in a line in Excel

Scenario: I am trying to plot values in a line: I have Max, Min, lower bound 1, upper bound 1, median value and my "Ret" value (which will change at each row, and each row would have its own line "graph").Each of these data point (max, min, bounds...) do have a numerical value.
Problem: I already tried all the graphing options in excel, but can't seem to find any way to get the wanted outcome.
Question: Is there a direct way to do that in excel?
This is what I am trying to achieve (each row will have one of these graphs, once I find out how to do it, I will write a VBA macro to automate this):
Apparently, the best way to do this is to assign a second value to all the rows and instead of plotting as a single column of values, plot each row as a coordinate. This answer came as an advice from a user in another forum, to the same question posted here (goo.gl/icL38d).
For a sample data:
Value X Y
min -5 0
max 5 0
median 0 0
lowb1 -2 0
lowb2min -4 0
upb1 2 0
upb2 4 0
Target 3 0
I plotted this as a scatterplot with the coordinates, and configured the target data point to stand out. The result was very close to the originally intended one.

How to plot multiple grouped data in one excel scatter plot with lines

I am facing some difficulties with plotting grouped data (by index) in one graph (scatter plot with lines) in Excel, and I will appreciate a lot your help.
My data are in three columns:
The first column is the index of the data or the group (i.e. a unique number for every set of data)
the second column is the time
and the third column is the data
Group, Time, Data
1 1 12
1 3 12
1 4 28
1 8 56
1 12 37
1 24 40
1 48 34
2 0 7
2 1 14
2 4 6
2 8 63
2 12 4
2 24 35
2 48 3
und so on.
and I want to plot the data vs. time for each index i.e. data group alone, but on the same graph.
Until now, I was always doing it manually by adding each data set separately to the graph. But I think there should be a more clever and easier way to do it, especially that sometimes I have a lot of data (index number can reach 70 or 80).
Thanks a lot in advance.
You can create a pivot table on all your data. Use 'Group' as column headers and 'Time' as row headers. The resulting pivot table will have all time points from all groups as rows and your groups as columns. Each columns of course has entries only at these time points which are included in its group. The other cells are empty. If you just select the data range of this pivot table without column headers, you can get charts from the data as a plot chart omits empty cells.
Update
That is the result pivot table of your test data. The sorted data are in the red frame. (Forget the total results)
A way to do this in Excel 365 is:
Select the data
Go to Data -> From Table/Range to open the Power Query editor
Select the columns with grouped data
Select Transform -> Pivot Column
Select the column with the values corresponding to the grouped data
Under Advanced Options change the value aggregation to Don't aggregate
Click OK, then Home -> Close and Load
This should give you the data formatted in such a way that you can select it and create a chart as normal.

Resources