I have a case requirement where I need to find the nearest N suppliers who supply specific product types. The types in an int range of 0..1048575, which represents an hierarchy. Any supplier could have multiple points since they could supply multiple product types.
I could store the long & lat in PostGIS and the types in an indexed int array column, and query against both with a limit of N. However, I don't believe this will be efficient as I am not sure that PostgreSQL will use both indexes.
Another idea I have is to store the types in a third "vertical" dimension. This would create stacked "vertical" shape segments at the long & lat shape of each supplier. To query, I would get the nearest long & lat intersects that simply intersect the desired type on the third dimension.
Is this possible with PostGIS using say 3DM? In other words, can I have it calculate nearest neighbor using only the long and lat, but use all 3 dimensions for the intersection?
Related
Given an NxM array of positive integers, how would one go about selecting integers so that the maximum sum of values is achieved where there is a maximum of x selections in each row and y selections in each column. This is an abstraction of a problem I am trying to face in making NCAA swimming lineups. Each swimmer has a time in every event that can be converted to an integer using the USA Swimming Power Points Calculator the higher the better. Once you convert those times, I want to assign no more than 3 swimmers per event, and no more than 3 races per swimmer such that the total sum of power scores is maximized. I think this is similar to the Weapon-targeting assignment problem but that problem allows a weapon type to attack the same target more than once (in my case allowing a single swimmer to race the same event twice) and that does not work for my use case. Does anybody know what this variation on the wta problem is called, and if so do you know of any solutions or resources I could look to?
Here is a mathematical model:
Data
Let a[i,j] be the data matrix
and
x: max number of selected cells in each row
y: max number of selected cells in each column
(Note: this is a bit unusual: we normally reserve the names x and y for variables. These conventions can help with readability).
Variables
δ[i,j] ∈ {0,1} are binary variables indicating if cell (i,j) is selected.
Optimization Model
max sum((i,j), a[i,j]*δ[i,j])
sum(j,δ[i,j]) ≤ x ∀i
sum(i,δ[i,j]) ≤ y ∀j
δ[i,j] ∈ {0,1}
This can be fed into any MIP solver.
I'm currently working with some weather data that I have as netcdf files which I can easily read with pythons xarray library
I would now like to get the n smallest values of my DataArray which has 3 dimensions (longitude, latitude and time)
When I have a DataArray dr, I can just do dr.min(), maybe specify an axis and then I get the minimum, but when I want to get also the second smallest or even a variable amount of smallest values, it seems not to be as simple
What I currently do is:
with xr.open_dataset(path) as ds:
dr = ds[selection]
dr = dr.values.reshape(dr.values.size)
dr.sort()
n_smallest = dr[0:n]
which seems a bit complicated to me compared to the simple .min() I have to type for the smallest value
I actually want to get the times to the respective smallest values which I do for the smallest with:
dr.where(dr[selection] == dr[selection].min(), drop=True)[time].values
so is there a better way of getting the n smallest values? or maybe even a simple way to get the times for the n smallest values?
maybe there is a way to reduce the 3D DataArray along the longitude and latitude axis to the respective smallest values?
I just figured out there really is a reduce function for DataArray that allows me to reduce along longitude and latitude and as I don't reduce the time dimension, I can just use the sortby function and get the DataArray with min values for each day with their respective times:
with xr.open_dataset(path) as ds:
dr = ds[selection]
dr = dr.reduce(np.min,dim=[longitude,latitude])
dr.sortby(dr)
which is obviously not shorter than my original code, but perfectly satisfies my demands
In my training set I have 24 Feature Vectors(FV). Each FV contains 2 lists. When I try to fit this on model = LogisticRegression() or model = KNeighborsClassifier(n_neighbors=k) I get this error ValueError: setting an array element with a sequence.
In my dataframe, each row represents each FV. There are 3 columns. The first column contains a list of an individual's heart rate, second a list of the corresponding activity data and third the target. Visually, it looks like something like this:
HR ACT Target
[0.5018, 0.5106, 0.4872] [0.1390, 0.1709, 0.0886] 1
[0.4931, 0.5171, 0.5514] [0.2423, 0.2795, 0.2232] 0
Should I:
Join both lists to form on long FV
Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.
How does Logistic Regression and KNNs handle input data? I understand that logistic regression combines the input linearly using weights or coefficient values. But I am not sure what that means when it comes to lists VS dataframe columns. Does it mean it automatically converts corresponding values of dataframe columns to a list before transforming? Is there a difference between method 1 and 2?
Additionally, if a long list is required, should I have the long list as [HR,HR,HR,ACT,ACT,ACT] or [HR,ACT,HR,ACT,HR,ACT].
You should go with 2
Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.
You should then select the feature columns from the dataframe and pass it as X, and the target column as Y to the model's fit function.
Sklearn's models accepts inputs with the following shape [n_samples, n_features], and since after following the 2nd solution you proposed, your training dataframe will have 2D of the shape [n_samples, 10].
I have distance value between two objects. Need algorithm to check whether measured distanced is available in distance of any two objects in grid pattern shown in Image
Grid for verification
This is grid with squared cells. All distances at such grid (expressed in units of cell size) should satisfy to condition
d^2 = a^2 + b^2
If squared distance is integer and you can represent it as sum of two integer squares, then objects can be placed in grid nodes.
There is mathematical criteria - number P is not representable as sum of two squares if it's factorization into primes contains any (4n+3)factor in odd power
I am working on a project that requires to calculate minimum distance between two strings. The maximum length of each string can be 10,000(m) and we have around 50,000(n) strings. I need to find distance between each pair of strings. I also have a weight matrix that contains the the weight for each character pairs. Example, weight between (a,a) = (a,b) = 0.
Just iterating over all pair of string takes O(n^2) time. I have seen algorithms that takes O(m) time for finding distance. Then, the overall time complexity becomes O(n^2*m). Are there any algorithms which can do better than this using some pre-processing? It's actually the same problem as auto correct.
Do we have some algorithms that stores all the strings in a data structure and then we query the approximate distance between two strings from the data structure? Constructing the data structure can take O(n^2) and query processing should be done in less than O(m).
s1 = abcca, s2 = bdbbe
If we follow the above weighted matrix and calculate Euclidean distance between the two:
sqrt(0^2 + 9^2 + 9^2 + 9^2 + 342^2)
Context: I need to cluster time series and I have converted the time series to SAX representation with around 10,000 points. In order to cluster, I need to define a distance matrix. So, i need to calculate distance between two strings in an efficient way.
Note: All strings are of same length and the alphabet size is 5.
https://web.stanford.edu/class/cs124/lec/med.pdf
http://stevehanov.ca/blog/index.php?id=114