Methodology to identity any overlapping/duplicate Geolocation POIs in Python - python-3.x

I'm trying to build a map for to show spending. Our vendor data is not entirely clean and we could have 5 or 6 definitions of the same place. We also have hierarcial POI issues also (think gives an entire shopping strip geometry instead of individual store). I'd like to know if there are any common algorithms to identity duplicates at the geometry object level.
My data are in Geopandas Dataframes with a 2-D polygon to represent a place, and other meta data (i.e. store name, category, etc.). Sometimes it will come with an address but not always and not a very consistent postal format.
I'm working in Python specifically, but open to any suggestions, not exactly limited to Python. Thank you in advance for any help or suggestions.

Related

.simplemap to octomap/point cloud and ground truth robot pose conversion

I want to use dataset at https://ingmec.ual.es/datasets/lidar3d-pf-benchmark/ in my project. The available map is .simplemap. What I understand is it stores both map and the robot poses as well. I want to get the point cloud representation of this map (which later I can convert into octomap) as well as vehicles ground truth pose in the map.
I have been able to get the CPose3DPDF from which I obtained CPose3d which I believe is the desired vehicle's ground truth pose. Please correct me if I am wrong. Now I have two problems. First the length of trajectory is just 97 which makes me suspicious about my code to obtain it. Second is about the CSensoryFrame which I obtain along with CPose3DPDF. When I get CObservation by doing CSensorFrame->getObservationByIndex and write to a file, it gives me idea that it stores velodyne readings. But I am unable to recover point cloud from it. Could anyone please guide me to a tool which can convert a .simplemap into a point cloud or an octomap representation and obtain vehicle's pose out of it as well. Many thanks in advance.
For the records: this one was answered here:
Your assumptions were all correct.
I realized the full UAL campus map was not included into the downloads. It's now available to download inside 2018-02-26-ual-campus-map.zip, at the bottom of this dataset page.
You can also regenerate the pointcloud, octomap from the .simplemap using the app application-observations2map.
Example .ini files can be found under MRPT/share/mrpt/config_files/*
You can also visually inspect .simplemap files with the robot-map-gui app.

Using Learning To Rank on textual documents?

i need some help in implementing Learning To Rank (LTR). It is related to my semester project and I'm totally new to this. The details are as follows:
I gathered around 90 documents and populated 10 user queries. Now i have to rank these documents based on each query using three algorithms specifically LambdaMart, AdaRank, and Coordinate Ascent. Previously i applied clustering techniques on Vector Space Model but that was easy. However in this case, I don't know how to change the data according to these algorithms. As i have this textual data( document and queries) in txt format in separate files. I have searched for solutions online and I'm unable to find a proper solution so can anyone here please guide me in the right direction i.e. Steps. I would really appreciate.
As you said you have applied the clustering in vector space model. the input of these algorithms are also vectors.
Why don't you have a look at the standard data set introduced for learning to rank issue (Letor benchmark) in which documents are shown in vectors of features?
There is also implementation of these algorithm provided in java (RankLib), which may give you the idea to solve the problem. I hope, this help you!

Why is it usually easier to perform selection tests in object space?

I'm taking an introductory graphics course, and while I intuitively understand that converting a click or touch into object coordinates will make the math much cleaner, reduce the chances for human error, and potentially make debugging easier, none of these are actually a very good explanation, conceptually, of why object coordinate spaces are used in selection tests, as opposed to simply using world coordinates for the test - rather, they're just observations of what tends to happen when object coordinates are used. So I ask: why?
A selection test involves comparing the click coordinates, which you get in window coordinates, against lots and lots of object features, which are represented in object coordinates.
You need to transform them into the same coordinate system in order to do the checks, so you can EITHER transform the one simple click point OR you can transform all the various object features.
Transforming one point or line is just a lot easier that transforming a whole bunch of object features of various types.
There are cases where the location of a specific object or point may not be known within a world coordinate system, but is known relative to some other coordinate system.
To summarize an example from my course text, consider the idea of two different towns, one using a grid system for its layout, and the other using what I can only describe as the New England we-made-cow-trails-into-roads method. A government employee is tasked with creating a layout of the area which includes them, and in doing so has to convert the two coordinate systems into a third, which encompasses the other two.
Sometimes, using a world atlas just isn't practical to get across the street, and so something much more local (and relevant) is used instead, as it provides much more detail over a much smaller area.
The text also explains that it may be more than simply impractical to use a given coordinate system - it may yield results that are improbable or just plain wrong. This is evidenced in the evolution of the geocentric and heliocentric models of the universe - the distance of the stars from us was calculated with very different results using the two models.
Thinking of my own example, the best that comes to mind would be something like your own internal organs - from the outside, you don't know for sure exactly the shape, size, and structure of each of them, but your own body does. In order to be able to access that information, you need to look inside the body (ideally in a way that doesn't kill you). It's not something that is plainly observable from outside.

Parsing addresses with ambiguous data

I have data of phone numbers and village names collected from the villagers via forms. Because of various reasons the data is inaccurate or incomplete.
The idea is to validate these two data points before adding them to the data base/store.
The phone numbers are being formatted programmatically and validated via an external API. (That gives me the service provider and province information).
The problem is with the addresses.
No standardized address line. Tons of ambiguity.
Numeric street names and door numbers exist.
Input string will sometimes contain an addressee.
Possible solutions I can think of
Reverse geocoding helps. But not very accurate when it comes to Indian context. The Google TOS also prohibits automated queries. (correct me if I'm wrong here)
Soundexing. Again not very accurate with Indian data.
I understand it's difficult to such highly unstructured data, but I'm looking for a ways to achieve atleast enough accuracy to map addresses to the nearest point of interest.
Queries
Given a village name from the villager who might spell it wrong or incorrectly or abbreviate it how do I get the correct official name of the village and location?
Any possible ways to sanitize bad location/addresses or decode complex/poorly formed addresses?
Are there any machine learning solutions that can help so I can learn from every computation?(I have 0 knowledge on ML, do correct me if I'm wrong here.)
What you want is a geolocation system that works with informal text input. I have a previously used a Text-based geolocation model trained on Twitter data.
To solve your problem, you need training data in the form of:
informal_text village_name
If you have access to such data (e.g. using the addresses which can be geolocated) then you can train a text-based classifier that given a new informal address can predict where on the map it points to. In your case every village becomes a class label. You can use scikit-learn to train the classifier.

How can I select layer by location AND attributes in ArcMap?

I have a dataset (i.e. a shapefile) containing spatial location data (coordinates) and elevation data as well as other attribute fields.
I want to select points which have at least 200m vertical separation (i.e. are at least 200m apart on the z-axis) AND are within 3km of each other.
The aim is to create a new shapefile with all points that have this relationship with 1 or more other points.
Im sure there is a solution to this problem (maybe not using arcmap at all?) but i just cant find it. any help would be greatly appreciated.
Chris
You are going to have much better luck asking this question in gis.stackexchange.com. Many more ESRI users/programmers there. As a matter of fact I bet you find your solution there without having to ask the question.
You can run the ArcGIS Near tool on all the points.
Then select by attribute points with Z values of >200m and distance values of <3000m.

Resources