Soccer positions (i.e. Defender, Midfielder, Forward) is there any order - statistics

I am unable to understand if is there any order in soccer position or is just random because I have to answer whether it is ordinal data or not

Firstly the variable involved here is categorical and in the rules of football there is no certain order that is associated with the given positions.
Like ->
Defender, Mid-Fielder and Forward
Forward, Mid - Fielder and Defender
Hence, Soccer positions is a nominal scale of measurement.

Related

Distance between straight lines

I work in the oil & gas industry and I'm seeking advice about how to calculate the minimum distance between a set of wells (the wells are drawn as straight lines on a map). My goal is for each individual well to have a unique "spacing" value (measured in feet) which is basically the straight-line horizontal distance to the closest wellbore on a map. Below is a simple example of what I'm trying to accomplish (assume the pipe | symbol is a wellbore and the dashes are the distance between the wells)
|--|---|-|
In the drawing above we have 4 wells. The 1st well (starting from the far left) would have a spacing value of 2 (since there are 2 dashes to the closest well), the 2nd well would also have a value of 2 (since the closest well is the one to the far left which is two spaces away), the 3rd well would have a value of 1, and the 4th well would have a value of 1.
Now imagine that I have hundreds of these wells (each with latitude/longitude points that describe the start & end points of each well) and I have them all mapped in TIBCO Spotfire (scattered across Texas). Do you guys know if it would even be possible to automate a calculation like the above? I would also like to build in a rule that says the max distance between wells is 2640 ft (half of a mile).
Any ideas are appreciated!
I think you should be able to do this without any R or iron python.
Within Spotfire, you can calculate the distance in miles between 2 points using the formula below (substitute 6371 for 3958.756 to get the answer in kilometres).
GreatCircleDistance([Lat 1],[Lon 1],[Lat 2],[Lon 2]) * 3958.756
For your use case, you could cross join your table of locations, so that you have a row for every possible location combination, then calculate the distance between them using the formula above. After that, it should be pretty straight forward to find each wells closest pair.

How to calculate the neighbouring grids on geohash. Algorithms required

Hi I am working with a database with an implementation of geohash
So as shown above, as the zoom level goes down (6 zoom levels), more of abcd gets inserted into each grid. I have represented them as a rigid grid; however, the central point is different for all the grids. So for example, distance from a to b will not be the same as distance from a to c.
If it was a rigid grid, I can just get the closest four neighbouring grids; however, I cannot do that as the distances vary and the closest neighbours are not necessarily orthogonal. Only information that I have from the database is the centra point of each grid and the geohash key e.g. aa, ab, etc..
How will I find girds that are just north, west, east and south of each grid for every zoom level? (have 6 zoom levels)
As you can see a cell that ends in, for example, d, has a cell with the same prefix but ending in b north of it. This allows you to set up a mapping table.
In the cases where, like cb going north, you end up at ad, the trick is to recognize that going north from *b you need to look at the previous character (in this case, the c), go north from that character (a) and add the south-most edge of that column (d).
In short, it's a matter of implementing some mapping tables and some logic to deal with the edges.

Correlation statistics

Naive Question:
In the attached snapshot, I am trying to figure out the correlation concept when applied to actual values and to calculation performed on those actual values and creating a new stream of data.
In the example,
Columns A,B,C,D,E have very different correlation but when I do a rolling sum on the same columns to get G,H,I,J,K the correlation is very much the same(negative or positive.
Are these to different types of correlation or am I missing out on something.
Thanks in advance!!
Yes, these are different correlations. It's similar to if you were to measure acceleration over time of 5 automobiles (your first piece of data) and correlate those accelerations. Each car accelerates at different rates over time leaving your correlation all over the place.
Your second set of data would be the velocity of each car at each point in time. Because each car is accelerating at a pretty constant rate (and doing so in two different directions from the starting point) you either get a big positive or big negative correlation.
It's not necessary that you get that big positive or big negative correlation in the second set, but since your data in each list is consistently positive or negative and grows at a consistent rate, it correlates with either similar lists.

Excel formula to calculate the distance between multiple points using lat/lon coordinates

I'm currently drawing up a mock database schema with two tables: Booking and Waypoint.
Booking stores the taxi booking information.
Waypoint stores the pickup and drop off points during the journey, along with the lat lon position. Each sequence is a stop in the journey.
How would I calculate the distance between the different stops in each journey (using the lat/lon data) in Excel?
Is there a way to programmatically define this in Excel, i.e. so that a formula can be placed into the mileage column (Booking table), lookup the matching sequence (via bookingId) for that journey in the Waypoint table and return a result?
Example 1:
A journey with 2 stops:
1 1 1 MK4 4FL, 2, Levens Hall Drive, Westcroft, Milton Keynes 52.002529 -0.797623
2 1 2 MK2 2RD, 55, Westfield Road, Bletchley, Milton Keynes 51.992571 -0.72753
4.1 miles according to Google, entry made in mileage column in Booking table where id = 1
Example 2:
A journey with 3 stops:
6 3 1 MK7 7DT, 2, Spearmint Close, Walnut Tree, Milton Keynes 52.017486 -0.690113
7 3 2 MK18 1JL, H S B C, Market Hill, Buckingham 52.000674 -0.987062
8 3 1 MK17 0FE, 1, Maids Close, Mursley, Milton Keynes 52.040622 -0.759417
27.7 miles according to Google, entry made in mileage column in Booking table where id = 3
If you want to find the distance between two points just use this formula and you will get the result in Km, just convert to miles if needed.
Point A: LAT1, LONG1
Point B: LAT2, LONG2
ACOS(COS(RADIANS(90-Lat1)) *COS(RADIANS(90-Lat2)) +SIN(RADIANS(90-Lat1)) *SIN(RADIANS(90-lat2)) *COS(RADIANS(long1-long2)))*6371
Regards
Until quite recently, accurate maps were constructed by triangulation, which in essence is the application of Pythagoras’s Theorem. For the distance between any pair of co-ordinates take the square root of the sum of the square of the difference in x co-ordinates and the square of the difference in y co-ordinates. The x and y co-ordinates must however be in the same units (eg miles) which involves factoring the latitude and longitude values. This can be complicated because the factor for longitude depends upon latitude (walking all round the North Pole is less far than walking around the Equator) but in your case a factor for 52o North should serve. From this the results (which might be checked here) are around 20% different from the examples you give (in the second case, with pairing IDs 6 and 7 and adding that result to the result from pairing IDs 7 and 8).
Since you say accuracy is not important, and assuming distances are small (say less than 1000 miles) you can use the loxodromic distance.
For this, compute the difference of latitutes (dlat) and difference of longitudes (dlon). If there were any chance (unlikely) that you're crossing meridian 180º, take modulo 360º to ensure the difference of longitudes is between -180º and 180º. Also compute average latitude (alat).
Then compute:
distance= 60*sqrt(dlat^2 + (dlon*cos(alat))^2)
This distance is in nautical miles. Apply conversions as needed.
EXPLANATION: This takes advantage of the fact that one nautical mile is, by definition, always equal to one minute-arc of latitude. The cosine corresponds to the fact that meridians get closer to each other as they approach the poles. The rest is just application of Pythagoras theorem -- which requires that the relevant portion of the globe be flat, which is of course only a good approximation for small distances.
It all depends on what the distance is and what accuracy you require. Calculations based on "Earth locally flat" model will not provide great results for long distances but for short distance they may be ok. Models assuming Earth is a perfect sphere (e.g. Haversine formula) give better accuracy but they still do not produce geodesic grade results.
See Geodesics on an ellipsoid for more details.
One of the high accuracy (fraction of a mm) solutions is known as Vincenty's formulae. For my Excel VBA implementation look here https://github.com/tdjastrzebski/Vincenty-Excel

Way to reduce geopoints?

Does anyone have any handy algorithms that could be used to reduce the number of geo-points ?
I am using a list of 2,000,000 postcodes which come with their own geo-point. I am using them to collect data from an API to be used offline. The program is written in C++.
I have to go through each postcode, calculate a bounding box based on the postcodes location, and then send it to the API which gives me some data near to that postcode.
However 2,000,000 is a lot to process and some of the postcodes are next to each other or close enough to each other that they would share some of the same data.
So far I've came up with two ways I could reduce them but I am not sure if they would work:
1 - Program uses data structure to record which postcode overlaps which and then run a routine a few time to removes the ones that have overlaps one by one until we are left without ones without overlapping postcodes.
Start at the top left geo point of the UK and slowly increment it the rough size of a postcode area until we have covered the entire UK.
Is there a easy way to reduce these number of postcodes so that I have few of them overlapping as possible ? whilst still making sure I get data covering as much of the UK as possible ? I was thinking there may be an algorithm handy for this, that people use else where.
You can use a quadtree especially a quadkey. A quadkey plot the points along a curve. It's similar to sort the points into a grid. Then you can traverse the grid to search deeper in the tree. You can also search around a center point. You can also use a database with a spatial index. It depends how much the data overlap but with a quadtree you can choose the size of the grid.

Resources