Is it possible to use geodata (latitude & longitude) as a predictor variable? - scope

I'm a newbie in programming and using machine learning. This is my first post here as I've just recently stumbled upon the first unresolved -of probably many- question.
So I have an extensive database with data on the real-state market of my country and I want to predict the price of the houses -a pretty standard theme ikr- using the latitude and longitude as one variable.
So far I have found Waddell & Besharati-Zadeh's study: https://arxiv.org/pdf/2011.14924.pdf in which they reconstruct the geodata by combining it with other libraries and obtaining string variables as to if certain activities are within a walking distance of 500 meters. So this is a cool alternative but I'm worried there's no accurate data of the walking distance and establishments to do certain activities in my country, not even on google maps. Is there any way in which the combination of the latitude and longitude alone can be used as a predictor variable?

Related

How to understand the process of transforming research data into certain distribution (not statistical distribution)?

Statistics is not my major and English is not my native language. I tried to apply for data analysis or data science work in industry. However, I do not know how to describe my research process below in a concise and professional way. I highly appreciated if you could provide me such help.
Background: I simulating properties of materials using different research packages, such as LAMMPS. The simulated data are only coordinates of atoms. Below are my data analysis.
step 1: clean the data to make sure the data complete and atom ID is unique and not exchangeable at different time moments (timesteps).
step 2: Calculated the neighbor atoms' distance of each center atom to find the target species (a configuration formed by several target atoms, such as Al-O-H, Si-O-H, Al-O-H2, H3-O)
step 3: count the amount of species as functions of space and/or time and draw the species distribution as functions of space and/or time, lifetime distribution of species.
NOTE: such distribution is different from statistical distribution, such as Normal Distribution, Binomial Distribution.
step 4: Based on above distribution, the correlation between species would be explored and interpreted.
After above steps, I study the mechanism behind based on materials selves and local environment.
Could anyone point out how to understand above steps in statistical terms or data analytic terms or others?
I sincerely appreciate your time and help.

Find the outliers or anomaly in gps data (time, latitude, longitude, altitude)

I have data. Based on the data (time, latitude, longitude, altitude) determine what are the typical routes that device makes during a full week.
After determining the baseline routes or typical area frequented by device we can start determining an anomaly based on the device traveling outside it’s frequent route/area.
Action: The process will then send an “alert” to the system is traveling outside it’s frequent area route
Please suggest which machine learning algorithm is useful. I am going to start clustering algorithm. Also tell me which python libraries is useful to use machine learning algorithm.
First of all, if you use Python, then use scikit-learn.
For this problem, there is multiple possibilities.
One way is indeed to use a clustering algorithm. For this purpose to get the anomaly too, you can use DBSCAN. It is an algorithm designed to get cluster and the outliers.
Another way would be (assuming you have for each device all their position) to use more funny way like a clustering algorithm on all the positions to get the important place, and after an LDA (latent dirichlet allocation) to get the main topics (here the words would be the index of the cluster, the document would be the list of position of each device and so the topics would be the main "routes").

Creating radar image from web api data

To get familiar with front-end web development, I'm creating a weather app. Most of the tutorials I found display the temperature, humidity, chance of rain, etc.
Looking at the Dark Sky API, I see the "Time Machine Request" returns observed weather conditions, and the response contains a 'precipIntensity' field: The intensity (in inches of liquid water per hour) of precipitation occurring at the given time. This value is conditional on probability (that is, assuming any precipitation occurs at all).
So, it made me wonder about creating a 'radar image' of precipitation intensity?
Assuming other weather apis are similar, is generating a radar image of precipitation as straightforward as:
Create a grid of latitude/longitude coordinates.
Submit a request for weather data for each coordinate.
Build a color-coded grid of received precipitation intensity values and smooth between them.
Or would that be considered a misuse of the data?
Thanks,
Mike
This would most likely end up in a very low resolution product. I will explain.
Weather observations come in from input sources ranging from mesonet stations, airports, and other programs like the "citizen weather observer" program. All of these thousands of inputs are input into the NOAA MADIS system, a centralized server that stores all observations. The companies that generate the API's pull the data from MADIS.
The problem with the observed conditions is twofold : one is that the stations are highly clustered in urban areas. In Texas, for example - there are 100's of stations in Central TX near the cities of San Antonio and Austin, but 100 miles west there is essentially nothing. To generate a radar image using this method would involve extreme interpolation- and...
The second problem is observation time. The input from rain gauges are many times delayed several minutes to an hour or more. This would give inaccurate data.
If you wanted a gridded system, the best answer would be to use MRMS (multi-radar-multi-sensor) data from the NWS. It is not an API. These are .grib files that must be downloaded and processed. This is the live viewer and if you want to work on the data itself you can use the NOAA Weather Climate Toolkit to view and/or process by GUI or batch process (You can export to geoTIF and colorize it with GDAL tools). The actual MRMS data is located here and for the basic usage you are looking for, you could use the latest data in the "MergedReflectivityComposite" folder. (That would be how other radar apps show rain.) If you want actual precip intensity, check the "PrecipRate" folder.
For anything else except radar (warning polygons, etc) the NWS has an API that is located here.
If you have other questions, I will be happy to help.

Excel Geocoding -- extract coordinates from map

I've been trying to get longitude and latitude coordinates from Japanese addresses. There are a number of ways to do this, but most of them (such as Google Maps) only allow a limited number of queries a day (I have ~15000), and many do not support Japanese addresses.
Here is an example form of the addresses that I am using:
東京都千代田区丸の内1-9-1
However, recently I found that the 3D maps tool in Excel 365 can plot addresses on a map, and it's fast enough to handle all of my addresses. However, although I can see these points on the Excel, I don't know if there's a way to export these points to longitude-latitude coordinate pairs.
Does anyone know a way to get the longitude-latitude pairs from Excel's 3D maps feature?
I've been working exactly same issue for weeks and i think best soluition is Google API

How to find bounding boxes for Geocode radius Searches

If I want to find all restaurants within a zip code, I can do a string search on the address, if I want to find all restaurants with 10 miles of a zip code, I need to do a location search. I have a database full of addresses and Geocodes should be no problem. But how do I compute the bounding box of an irregular shaped area, like a zip code, or city, or state or Metro Area?
Is there a tool around that does this? is this information for sale somewhere?
My initial solution is to create an estimate of the areas by searching for all addresses within them and deriving the simplest polygon that surrounds them and using that as a bounding box. However this seems a really brute force way to do this. Do I do this calculation for every city, state, and zip in my database and store it? How have other people solved this problem?
Companies such as Maponics have polygon data on neighborhoods, counties, cities, states, provinces, townships, etc. There may be other providers.
Many of these polygons have huge numbers of points, so you should either:
compute bounding boxes, or
precompute the zip, neighborhood, city, etc. identifier for each address, and index a search collection by these regions.
But why build your application by storing a database of places and computing geographic data on your own? You can partner with providers such as CityGrid; they provide APIs for places that can be searched by neighborhood, zip, etc.; you can use their data for free in your own local application.
If you happen to be using PostgreSQL for your database, you can use box(geometry) or a variation thereof to compute the bounding box for a geometry. You can also implicitly use the bounding box for a geometry in your SQL. For example (from Using PostGIS: Data Management and Queries):
SELECT road_id, road_name FROM roads WHERE roads_geom && ST_GeomFromText('POLYGON((...))',-1);
where && "tells whether the bounding box of one geometry intersects the bounding box of another".
To get the bounding box for a collection of geometries, you can first use Collect or Union to aggregate or combine all the geometries together.
Of course, if you are not using PostGIS, the functionality really comes from GEOS, which is the underlying library that PostGIS actually uses. The basic geometry functions can be used directly (from python for example) to do what you want.

Resources