SQLAlchemy Search Algorithms - search

I am working on a pyramid web application that uses SQLAlchemy.
I have a database of 7000 US universities, and an input box on the front end to input a university. The input box uses jQuery UI's autocomplete, which later calls my pyramid app with AJAX to get a list of options.
The way I am fetching results based on the term are as follows:
session.query(University).filter(University.name.like('%' + term + '%')
This works pretty decently, however, I would like to support the following - when a user enters UCLA, I would like to find University of California - Los Angeles. For FSU - Florida Status University, and so forth.
Other than iterating my entire list of universities (~8000), and manually deciding if the name matches, is there a way to specify a matching function to sqlalchemy's query/filter?
Thanks!

SQLAlchemy is essentially a front-end to SQL. If your query isn't something that can be easily expressed as SQL, it's unlikely you're going to see any benefit from trying to twist it in that direction relative to just iterating manually - it still has to pull out a whole row and compare it, whether you do it or it does it.
However, if when you search for X, you also want to search for a Y which can be derived from X, look up the Y first, then query for both using the OR operator.
ie. Don't do:
get all rows, and for each row, see if it matches X or Y
instead do:
get all rows that match X plus all rows that match Y.

Related

How to allow REST filters for non-DB fields in django

My problem is this:
I'm working on a django project that has a lot of DB tables representing many different objects. This project has a GUI interface that is mostly a table that is a rough representation of each DB table. This table I got from google images can be used as an example
Unlike this table, mine has filters on top of each column so for example you can type a name you're looking for and it will only show relevant rows. The problem starts here - Not all columns in the GUI are really columns in the DB. For example let's say the "Location" column isn't a DB column, but a property on the django model like this
#property
def location(self):
return f"{self.city}, {self.state}"
So obviously sending to my django backend this URL won't work just like that
https://mywebsite/api/people/?location__icontains=Chicago
My workaround for this is going to the View and overriding filter_queryset to handle this specific case
. Something like this:
def filter_queryset(self, queryset):
if "location__icontains" in self.request.query_params:
filter = self.request.query_params["location__icontains"]
q = Q(city__icontains=filter ) | Q(state__icontains=filter)
queryset = queryset.filter(q)
The issue is that this doesn't even really cover all cases, it only works if the user filter input is either a city or a state and not some combo like "ago, IL" (supposed to yield "Chicago, IL" in the results). Also I have too many of these cases since us developers added tones of properties or serializer fields to the display over the years and customers expect to be able to filter on them. What I'm looking for is ideally an idea to generically handle all of these (perhaps filter on the serialized entries and not on the DB rows?)
I tried to search for this problem in many different phrasing types but found no solution. For now we have to implement a hack for each specific case

finding organization and industry/sector from string in dbpedia

I am generating a short list of 10 to 20 strings which I want to lookup on dbpedia to see if they have an organization tag and if so return the industry/sector tag. I have been looking at the SPARQLwrapper queries on their website but am having trouble constructing one that returns organization and sector/industry for my string. Is there a way to do this?
If I use the code below I get a list of industry types I think rather than the industry of the company.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?industry WHERE
{ <http://dbpedia.org/resource/IBM> a ?industry}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
Instead of looking at queries which are meant to help you understand the querying tool, you should start by looking at the data which is being queried. For instance, just click http://dbpedia.org/resource/IBM, and look at the properties (the left hand column) to see its rdf:type values (of which there are MANY)!
Note that IBM is not described as a ?industry. IBM is described as a <http://dbpedia.org/resource/Public_company> (among other things). On the other hand, IBM is also described as having three values for <http://dbpedia.org/ontology/industry> --
<http://dbpedia.org/resource/Cloud_computing>
<http://dbpedia.org/resource/Information_technology>
<http://dbpedia.org/resource/Cognitive_computing>
I don't know whether these are what you're actually looking for or not, but hopefully what I've done above will start you down the right path to whatever you do want to get out of DBpedia.

Expression Engine - passing multiple categories as URL segments

I'm trying to create a product filter with deep-linking capability. Essentially, I want the user to be able to filter my product list on multiple categories and have the URL reflect the filtering they've done.
So it would start as:
www.site.com/products/
My first level of category filtering already works. So I can use EE's regular handling of URL segments to get to my first level of filtering. For instance:
www.site.com/products/leatherthongs
Returns a filtered subset showing only a spectacular collection of leather thongs. But now I want the user to be able to filter on another category - color for instance. This is where stuff stops working.
EE's way of handling multiple categories inside templates (with ampersands or pipes) doesn't work in the URL:
www.site.com/products/leatherthongs&red
Nor does any variation that I've tried.
My next move is to create a simple raw PHP method that can capture regular querystring parameters and then inject them into the {entries} tag before rendering. Not very difficult, but quite ugly. I would love to know if there is a way to handle multiple categories in the URL natively.
Thanks for your time.
Have you considered using Low's Seg2Cat add-on? I'm not sure how complex you want to make this but it seems that you could specify something in your channel:entries loop like categories='{segment_2){if segment_3}|{segment_3_category_id}{/if}'
This exact syntax is untested but I have had success in the past with a similar solution.

Designing library to generate dynamic MDX query

We are generating MDX query dynamically. We pass list of Columns ([DimesnionName].[Attribute.Name] format), Rows ([DimesnionName].[Attribute.Name] format) and Filter ([DimesnionName].[Attribute.Name].[Member Name] format) along with other inputs like, cube name, page number, measure etc.
This information is passed to a C# library and then we use lot of 'If' and 'Else' conditions to process this input and generate MDX query as a string. It requires lot of string manipulation.
You can say it has a workflow. After going through each condition, system generates some output. I am wondering if there is a smarter way to design this library.
I want to remove if else conditions.
I want to make it more readable.
I want to make it more manageable
My Question is: Is there any design principle I can use? I can think of using Windows WorkFlow. Please provide your suggestions
I'm actually on here to see if someone has done just that so I don't have to. No luck so far. But off the top of my head what you might want to look at is some form of rules engine that will evaluate the state of target string and add your various criteria.
Now I haven't even started to look into the syntax of MDX. I'm not that far along, but if I wanted to create an engine to create sql queries I'd look at the parts ( simplest case first ) you need list of columns, a table and list of where clauses. So you could have three or maybe just two basic engine classes one that takes a list of strings and (or better yet a list of expressions) and concatenates them ( or evaluates and then concats them ). If target string is empty then targetString = "select "+ x else targetString = ", " + x. Then do something similar with the where expression. You can get considerably more fancy for that building classes that implement the different forms of where expressions and so on. Then ultimately you'd pass your engine something like
MySqlEngine(new[] {"FirstName", "LastName", "GirlFriendsAddress"},
new []{EqualsExpression("FirstName","Brown"), EqualsExpression("LastName",Dynamite")},
"People");
and it would return
"SELECT FirstName, LastName, GirlFriendsAddress From People Where FirstName = \"Brown\" AND LastName = \"Dynamite\""
I would highly recommend using Expressions to evaluate properties on a target model that matches your table. Then you could make MySqlEnigine(...) you wouldn't have to provide to table name because your model could be named the same and you'd use no strings except for target value of the where clauses.
I know this is not the engine you want but I don't know MDX yet so you'll have to use this as an analogy.
Final thoughts DO NOT USE Window Workflow. You will want to kill yourself half way through and if you make it all the way through than there will be developers cursing your name for many years in the future.
Good luck
oh and if you build the please open source it and tell me so I don't have to do it.

Implementing a location search in ElasticSearch

I have a problem with location queries returning erroneous results in ElasticSearch.
In our system, a business search engine, every search takes two inputs: a location, and a query-string, e.g.
q=sushi
location=Greenwich Village, New York, New York
I want the search to show me sushi in Greenwich Village first, then sushi outside of Greenwich Village, but to never show me non-sushi results.
The problem is, because of the location query, anything in Greenwich Village gets matched -- lawyers, doctors, whatever. I'd like say the following to ElasticSearch:
If q matches, then location doesn't have to (it's OK to return sushi outside of Greenwich Village), but if location matches, don't return it unless q matches also (not OK to return non-sushi businesses in Greenwich Village).
Anyone have any thoughts on how to do this?
It sounds like you want to search for "sushi" (you don't want non-sushi results) but sort your results by location (you want Greenwich Village results first).
If you are storing locations as geo points, you can simply use distance to sort your results.
If location is just a field, and you can only know if the business is inside or outside of a location, you can use Custom Filters Score query to boost relevancy of the results in the desired location. The query part should contain the search for "sushi" and the filters part should contain the search for location.
I incorporated the information on this post and here to to come up with the following solution.
Index every 'place' (neighborhood, city, etc) with a center-point, and also index the coordinates of every business.
Index the place ids attached to the businesses that contain them.
Use a sub-search to convert the text entered into the location bar to a place record.
Use a CustomScoreQuery to modify every result's score by the following formula, which was worked out by trial and error:
new_score = old_score / (1 + distance_between_place_centerpoint_and_result)^3
Also query the place id that results from 3 against the place_ids field as a 'should' boolean query. This gives a flat boost to everything that actually falls within the confines of the specified place.
A side effect of this strategy is that businesses near the center point of the place are considered more relevant -- it is arguable, in my opinion, whether this is correct or not. But other than that it has worked quite well.
Thanks to imitov for his insight that helped me come up with this solution.

Resources