Can a "points inside single geometry" query be sped up using a spatial index? - geospatial

I am using SpatiaLite (or rather, its WebAssembly port SpatiaSQL.js) in a web application.
A use-case in this application is that the user wants to query point geometries (>500k of them, anywhere in the world) against a single (multi-)polygon geometry, filtering points which are not inside said geometry.
I don't know the (multi-)polygon in advance, because it is yielded by a dynamic mechanism. It could be e.g. a country, province, city, or a hand-drawn custom polygon.
Using a simple query with SpatiaLite's Within function works great. Performance is also great for small (multi-)polygons, such as a city. However, when querying against, say, Australia (a huge multi-polygon with many parts), performance suffers.
I am tied to SpatiaLite or a custom solution because most spatial database solutions cannot run in client-side JavaScript.
Sorry if this is an obvious question - I didn't find any similar questions so far.
Any thoughts on how to speed this up? Some loss of accuracy is acceptable.

Related

How to create geocoding service (find polygons that intersect a given point)

SITUATION
I have a database with 2,000,000 cities. All of them have coordinates of the city center and mostly all - GeoJSON boundaries. I'm trying to implement a geocoding service that would find cities that intersect with a given point using node.js, mongodb, redis, memcached (and golang, if that is necessary, cause I'm just totally new to it )
PROBLEM
I know how to work with points (lat and lng) since both MongoDB and Redis support geoindexes but I've never seen anything about polygons.
I guess MongoDB won't really help cause of its speed (since it work on disks), but any memory database should deal with this problem. The thing is I can't even think of any way to implement it.
I'll be happy if someone point me how to make it. Thanks.
You may implement a point-in-polygon algorithm yourself. I've done something similar on https://api.3geonames.org
First do a bounding box to identify candidate polygons, then run a PIP. https://en.wikipedia.org/wiki/Point_in_polygon
geo.lua (https://github.com/RedisLabs/geo.lua) works with the requirements you have here but it's not very performant (not sure what has changed since last i checked).

No depot VRP - roadside assistance

I am researching a problem that is pretty unique.
Imagine a roadside assistance company that wants to dynamically route its vehicles. Hence for each packet of new incidents wants to create routes that will satisfy them, according to some constraints (time constraints, road accessibility, vehicle - incident matching).
The company has an heterogeneous fleet of vehicle (motorbikes for easy cases, up to tow trucks for the hard cases) and each incident states it's uniqueness (we know if it wants just fuel, or needs towing).
There is no depot, only the vehicles roaming on the streets.
The objective is to dynamically create routes on the way, having in mind the minimization of time and the total traveled distance.
Have you ever met such a problem? Do you have any idea in which VRP variant it belongs?
I have seen two previous questions but unfortunately they don't fit with my problem.
The respected optaplanner - VRP but with no depot and Does optaplanner out of box support VRP with multiple trips and no depot, which are both open VRPs.
Unfortunately I don't have code right now, as I am still modelling the way I will approach this problem.
I am really sorry for creating a suggestion question and not a real one.
Thank you so much in advance.
It's a rich dynamic/realtime vehicle routing problem. You won't find an exact name for your problem, as when VRPs get too complex they don't fit inside any of the standard categories.
It's clearly a dynamic/realtime problem (the terms are used interchangeably) as you would typically only find out about roadside breakdowns at short notice.
Sometimes you're servicing a broken down car, which would be a single stop (so a vehicle routing problem). Sometimes you're towing a car, which would be a pick-up delivery problem. So you have a mix of both together.
You would want to get to the broken down vehicles ASAP and some would need fixing sooner than others (think a car broken down in a dangerous position on a motorway). You would therefore need soft time windows so you can penalise lateness instead of the standard hard time windows supported in most VRP formulations.
Also for you to be able to scale to larger problems, you need an incremental optimiser that can restart from the previous (possibly now infeasible) solution when new jobs are added, vehicle positions are changed etc. This isn't supported out of the box in the open source solvers I know of.
We developed a commercial engine which does the above. We started off using the jsprit library, which supports mixing single stop and pickup delivery problems together. We later had to replace jsprit due to the amount of code we had to override to get it running happily for realtime problems, however jsprit may still prove a useful starting point for you. We discuss some of the early technical obstacles we had to overcome in getting jsprit to handle realtime problems in this white paper.

tensorflow for classification of strings vs elasticsearch

So, a little bit on my problem.
TL;DR
Can I use machine-learning instead of Elastic Search to find results depending on the user's text input? Is it a good idea?
I am working on a car spare parts project, and we have split the car into 300 parts that we store on the database, with some data for each part (weight, availability, etc).
When the customer inputs the text of his part, we need to be able to classify the part, and map it to one in our database.
The current way it's being done is by people on our team manually mapping the customer text with the parts on our database, we want to automate that process.
We tried using MongoDB text search, but it was often inaccurate since parts have different names in different parts of the country.
So we wanted something that got more accurate results, and improved by the more data we have, we immediately considered TensorFlow, after some research and taking part of Google's Machine Learning Crash Course, I got to that point where it specified:
Models can't learn from string values, so you'll have to perform some feature engineering to convert those values to something numeric
That would be useful in the case we have limited number of features as strings, but we don't know what the user will input as a text.
So, my questions are:
1- Can we use Machine Learning to map text input by the user with some documents on our database?
2- If we can do that, is it a good idea to favor it over other search tools like ElasticSearch?
3- Can ElasticSearch improve its results the more data we have? How?
4- How would you go about this problem?
Note: I'd be doing that in Node.js, and since TensorFlow.js is new, I am inclining to go for other solutions, but if push comes to shove, and the results are much better, I would definitely go there.
TL;DR: Yes and yes.
TS;WM:
This is a perfectly suited problem for machine learning. Especially so, if you have a database of past customer texts that have already been mapped to parts. Ideally, you have hundreds of texts mapped to each part. If that is present, you can design and train a network. And models can learn from string values with some engineering, and it's not that bad.
I'm not sure ElasticSearch would improve much on the network. I don't know much about auto parts trading, but as a wild guess, "the large round thingy that helps change direction" would never be mapped to "steering wheel" by ES but could be learned easily by a network - provided there are at least some examples of people using that text to specify steering wheel.
You can but don't have to necessarily use tensorflow.js for your network. The AI could run on your server as a webservice, and you'd just send over the customer's text to it and it would send back it's recommendations of part SKUs and names.

Which are the best tools available for an online routing application?

So here's my question. Supposing that one is about to create an online web appliation that takes as user input a current location and a location for destination, and displays as a result one of the 5-6 available routes that are stored in a database that is most suitable in terms of distance,and Open Street Map data and Open Layers are used which would be the best way to make this happen?
What I am asking for is what would I need for:
1.Storing the data in database
2.Do the routing calculations. If I would like to change a bit the algorithms for academic reasons and have more control of my final result how would I do that? Do I need any programming language? Any good tutorials?
3.What is the difference between using pgRouting and using any custom solution(like mentioned above)? Doing the all the coding again by myself would be like reinventing the wheel?
4.What would be best for a commercial website, where fast calculations would be needed?
UPDATE: What I need is a way to connect 1.user input(as geometry points) 2.Routing algorithm I have written 3.Road Network and return a result in terms of best way to go to a point
Please see the list of online routers and offline routers for OSM as well as the general wiki page about routing with OSM.
If that still doesn't answer your questions, ask a more specific one.

How to estimate search application's efficiency?

I hope it belongs here.
Can anyone please tell me is there any method to compare different search applications working in the same domain with the same dataset?
The problem is they are quite different - one is a web application which looks up the database where items are grouped in categories, and another one is a rich client which makes search by keywords.
Is there any standard test giudes for that purpose?
There are testing methods. You may use e.g. Precision/Recall or the F beta method to estimate a rate which computes the "efficiency". However you need to make a reference set by yourself. That means you will somehow measure not the efficiency in the domain, more likely the efficiency compared to your own reasoning.
The more you need to make sure that your reference set is representative for the data you have.
In most cases common reasoning will give you also the result.
If you want to measure the performance in matters of speed you need to formulate a set of assumed queries against the search and query your search engine with these at a given rate. Thats doable with every common loadtesting tool.

Resources