Unable to order Haystack/Whoosh results (and it's extremely slow) - django-haystack

I'm using Haystack and Whoosh to search a custom app with city data from the Geonames project.
I only have a small amount of the Geonames city data imported (22917 records). I'd like to order the results by a city's population and I'm having trouble getting good results.
When I use order_by on my SearchQuerySet, the results are extremely slow. It also orders properly with against the 'name' field but not 'population', so I think I'm probably just doing something wrong.
Here's the search index:
class EntryIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(indexed=False, model_attr='ascii_name')
population = indexes.CharField(indexed=False, model_attr='population')
django_id = indexes.CharField(indexed=False, model_attr='id')
def get_model(self):
return Entry
def index_queryset(self):
return self.get_model().objects.all()
Here's the template:
{{ object.ascii_name }}
{{ object.alternate_names }}
{{ object.country.name }}
{{ object.country.iso }}
{{ object.admin1_division.ascii_name }}
{{ object.admin1_division.name }}
{{ object.admin1_division.code }}
{{ object.admin2_division.ascii_name }}
{{ object.admin2_division.name }}
Here's the relevant view code:
query = request.GET.get('q', '')
results = SearchQuerySet().models(Entry).auto_query(query).order_by('population')
When I take the order_by off the query, it returns in less than one second. With it on, it takes almost 10 seconds to complete, and the results are not ordered by population. Ordering by name works, but it also takes ~10 seconds.
Note: I've also tried with the built-in Haystack search view, and it's very slow when I try to order by population:
qs = SearchQuerySet().order_by('-population')
urlpatterns = patterns('',
...
url(r'^demo2/$', SearchView(searchqueryset=qs)),
)

I'm doing nearly the same thing, and ordering works fast and correctly for me.
The only thing you're doing that differs significantly is:
query = request.GET.get('q', '')
results = SearchQuerySet().models(Entry).auto_query(query).order_by('population')
Since you specify a request, I'm assuming you've created your own view. You shouldn't need a custom view. I have this implemented with this in my urls.py:
from haystack.forms import ModelSearchForm
from haystack.query import SearchQuerySet
from haystack.views import SearchView, search_view_factory
sqs = SearchQuerySet().models(MyModel).order_by('-weight')
urlpatterns += patterns('',
url(r'^search/$', search_view_factory(
view_class=SearchView,
template='search/search.html',
searchqueryset=sqs,
form_class=ModelSearchForm
), name='search'),
)

I found I could not order results using order_by either. I was getting what seemed like a strange partial sorting. I eventually realised that the default ordering was by relevance ranking. The order_by I was using was presumably only sorting within each rank. This point is not really brought out in the Haystack documentation.
I guess the lesson is probably that if you want your results order to ignore relevance you need to post process your results before displaying them.
Probably a bit off topic, but I was a little surprised your index population field is a CharField. Does this match with your model?

I know I'm three years late, but recently I faced the same issue with a project I've been given.
I guess the only problem is the indexed=False parameter you are passing to the population CharField.
I fixed my problem by removing that.

Related

Show only product with special prices

Is it possible to show only products with special prices?
I put {% if product.special %} in my product_card.twig, and that works fine but then my pagination doesn't work correctly. It still shows the total number of products that belongs to that category.
I have 5 products in some category but pagination says "Showing 1 to 9 of 9 (1 pages)".
Is there any other way to achieve this?
You should create new model function that get query from DB some thing like
public function getSpecialByCategory() {
// your query here
}
and get result by controller then controller send it to your Twig file.

TYPO3: Performance issue with pagination

I am currently building some kind of video channel based on an extension I created. It consists of videos and playlist that contains videos (obviously).
I have to create a page which contains a list of videos AND playlist by category. You also can sort those items by date. Finally, the page is paginated with an infinite scrolling that should load items 21 by 21.
To do so, I created on both Video and Playlist repositories a "findByCategory" function which is really simple :
$query = $this->createQuery();
return $query->matching($query->equals('categorie.uid',$categoryUid))->execute()->toArray();
Once I requested the items I need, I merge them in one array and do my sorting stuff. Here is my controller show action :
if ($this->request->hasArgument('sort'))
$sort = $this->request->getArgument('sort');
else
$sort = 'antechrono';
//Get videos in repositories
$videos = $this->videoRepository->findByCategorie($categorie->getUid());
$playlists = $this->playlistRepository->findByCategorie($categorie->getUid());
//Merging arrays then sort it
if ($videos && $playlists)
$result = array_merge($videos, $playlists);
else if ($videos)
$result = $videos;
else if ($playlists)
$result = $playlists;
if ($sort == "chrono")
usort($result, array($this, "sortChrono"));
else if ($sort == "antechrono" || $sort == null)
{
usort($result, array($this, "sortAnteChrono"));
$sort="antechrono";
}
$this->view->assignMultiple(array('categorie' => $categorie, 'list' => $result, 'sort' => $sort));
Here is my view :
<f:widget.paginate objects="{list}" as="paginatedList" configuration="{addQueryString: 'true', addQueryStringMethod: 'GET,POST', itemsPerPage: 21}">
<div class="videos row">
<f:for each="{paginatedList}" as="element">
<f:render partial="Show/ItemCat" arguments="{item: element}"/>
</f:for>
</div>
</f:widget.paginate>
The partial render shows stuff including a picture used as a cover. So I need at least this relation in the view.
This works fine and shows only the items from the category that is requested. Unfortunatly I have a huge performance issue : I tried to show a category that contains more than 3000 records and It takes about one minute to load. It's a little bit long.
By f:debugging my list variable, I see that it contains every records even through it shouldn't be the case (that's the point of pagination...). So the first question is : is there something wrong in the way I did my pagination ?
I tried to simplify my requests by enabling the rawQuery thing ($query->execute(true)) : I get way better performance, but I can't get the link for the pictures (in my view, I get 1 or 0 but not the picture's uid...). Second question : is there a way to fix this issue ?
I hope my description is clear enough. Thanks for your help :-)
When you execute a query, it will not actually fetch the data from the database until the results are accessed. If the paginate widget gets a query result it will add limits and offset to the query and then fetch the data from the database, so you will only get the records that are shown on a page.
In your case you added toArray() after execute(), which accesses the results, so the data is fetched from the database and you get all records. The best solution I can think of is to combine the 2 tables into 1 so you can do it with a single query and don't have to merge and order them in PHP.
As long as you sort the data after the query you have to handle all data (request all records and especially resolve all relations).
Try to sort the data in the query itself (order by), so you could restrict the data to only those records which are needed for the current 'page' (limit <offset>,<number>).
Here a complex query with join and limit could be faster than a full query and filtering in PHP.

Flask-AppBuilder equivalent of SQLite WHERE clause to filter column data

I'm new to Flask and have started designing a front end for an inventory management database using Flask-AppBuilder.
I have created several models and have have managed to display my sqlite data in tables using Flask-AppBuilder's views.
However, I don't seem to be able to find the equivalent of SQLite WHERE clause to filter or "restrict" column data. I've been reading a lot about sqlalchemy, filters, queries but this has left me more confused that anything else and the explanations seem to be extremely elaborate and complicated to do something which is extremely simple.
Assuming we reproduce the following SQLite query in Flask-AppBuilder:
SELECT Field_A
FROM Table_A
WHERE Field_A = 'some text'
with:
result = session.query(Table_A).filter_by(Field_A = 'some text').all()
Where does the above line of code go in my app?
Considering I have the following Class:
class Table_A(Model):
id = Column(Integer, primary_key=True)
Field_A = Column(String)
def __repr__(self):
return self
and View:
class Table_AView(ModelView):
datamodel = SQLAInterface(Table_AView)
label_columns = {'Field_A':'A'}
list_columns = ['Field_A']
After much digging flask-appbuilder uses it's own filterclass in order to enable you to filter your views.
All the classes are referenced here on GitHub:
Flask Filter Clases List
Also not the difference between FilterEqual and FilterEqualFunction here:
What is the difference between : FilterEqual and FilterEqualFunction?
For other customisation and first port of call of Flask-appbuilder go straight to the API Reference where you'll find a couple of examples of the filterclass in action.
In essence it is extremely simple. In your views.py code within the ModelView class you want to filter simply add base_filters = [['field_A', FilterEqual, 'abc']] like so:
`class Table_AView(ModelView):
datamodel = SQLAInterface(Table_AView)
label_columns = {'Field_A':'A'}
list_columns = ['Field_A']
base_filters = [['field_A', FilterEqual, 'abc']]`
This will only show the lines where the field_A variable is equal to abc.
Hope this helps someone as it took me nearly (sigh) two weeks to figure it out...
SQLALchemy is an ORM (Object-Relational Mapping), it mean that you dont have to deal with raw SQL, you will call a function that you "build" (by adding filters in your case). It will transparently generate an SQL query, execute it, and return the result as python objects.
I would suggest you to read closely at sqlalchemy documentation about filters again, especially filter_by :
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.filter_by
It is the easiest way to apply a WHERE with sqlalchemy.
If you have declared correctly the model for Table_A, you should be able to use it so:
result = session.query(Table_A).filter_by(Field_A = 'some text').all()
Here session.query(Table_A).filter_by(Field_A = 'some text') will generate the SQL, and .all() will execute it.

Django-Haystack - How to use Haystack with django-comments?

I'm struggling with Django-Haystack.
I need to do an Index that have Articles and Comment articles. My doubt is how can I put in a document based index the Articles and the Comments.
How can I search for keywords in the comments and in the articles and output the article with that keywords(article comments, article)?
It is possible?
Best Regards,
The first thing to do is forget the notion that a SearchIndex must correspond exactly to a model. It's only sourced from one.
The simplest way to do this would be to add the comments to the indexed document using a template. This presume your Article model as a title field:
class ArticleIndex(SearchIndex, indexes.Indexable):
text = CharField(document=True, use_template=True)
title = CharField(model_attr='title')
def get_model(self):
return Article
Note the keyword argument use_template is set to true. The default value for this is search/indexes/{app_label}/{model_name}_{field_name}.txt. In that template just output the content you want to index. E.g.
{{ object.title|safe }}
{{ object.body|safe }}
{% for comment in object.comments.all %}
{{ comment|safe }}
{% endfor %}
While I'm afraid the specific reverse relation name here is probably wrong, that's the gist of what you want to do. Again, this is a simple way of accomplishing what you've specifically stated.
This is what worked for me:
In your models.py, presuming comments are attached to an Article, you want a method that returns comments attached to it (there is no easy way to do this):
class Article:
def comments(self):
ids = [self.id]
ctype = ContentType.objects.get_for_model(Article)
comments = Comment.objects.filter(content_type=ctype,
object_pk__in=ids,
is_removed=False)
return comments
In your search_indexes.py, make sure the ArticleIndex has use_template=True:
from django.contrib.contenttypes.models import ContentType
from django.contrib.comments.models import Comment
class ArticleIndex(SearchIndex):
text = CharField(use_template=True)
In your index template, e.g. templates/search/indexes/article_text.txt:
{% for comment in object.comments.all %}
{{ comment }}
{% endfor %}
Now, the only remaining problem is to update that specific index object when a comment is added or removed. Here we use signals:
In your models.py:
from django.dispatch import receiver
from haystack import site
from django.contrib.comments.signals import (comment_was_posted,
comment_was_flagged)
#receiver(comment_was_posted)
def comment_posted(sender, **kwargs):
site.get_index(Article).update_object(kwargs['comment'].content_object)
#receiver(comment_was_flagged)
def comment_flagged(sender, **kwargs):
site.get_index(Article).update_object(kwargs['comment'].content_object)

Haystack - Why does RealtimeSearchIndex sometimes not update my saved object

I'm using Haystack and Whoosh with Django
Within search_index.py I have this
class PageIndex(RealTimeSearchIndex):
text = CharField(document=True, use_template=True)
creator = CharField(model_attr='creator')
created = DateTimeField(model_attr='created')
org = CharField(model_attr='organisation')
site.register(Page, PageIndex)
My template looks like this
{{ object.name }}
{{ object.description }}
{{ object.template|striptags }}
{% for k,v in object.get_variables.items %}
{{ v }}
{% endfor %}
If I save the Page with an updated name or description then it updates straight away and includes the variables from get_variables.items in the template. However if I update just the variable then it doesn't update.
Is it because variable is another object that's related to it and even though I am saving on the same page it does not pick up a change to the Page? If so how do I force to update the Page item when I'm updating related objects?
I concur with Daniel Hepper, but I think the easiest solution here is to attach a listener to your related model's post_save signal (see https://docs.djangoproject.com/en/dev/topics/signals/) and in that, reindex the model.
E.g, in myapp/models.py, given model MyRelatedModel which has a foreignkey to MyModel
from myapp.search_indexes import MyModelIndex
def reindex_mymodel(sender, **kwargs):
MyModelIndex().update_object(kwargs['instance'].mymodel)
models.signals.post_save.connect(reindex_mymodel, sender=MyRelatedModel)
A RealTimeSearchIndex only updates the search index when a model it is registered on is saved or deleted, or to be more precise, when the post_save/post_delete signal of the model is emitted. These signals are not emitted if a related model is deleted/saved or when a bulk update/delete operation is executed.
To solve your problem, you could create a subclass of RealTimeSearchIndex that also updates the index on post_save/post_delete signals of the related model.
Just a note for more recent viewers of this post ---- RealTimeSearchIndex has been deprecated.
See here for the Haystack post about it.
For recent viewers, here's a solution based on the new RealtimeSignalProcessor:
In myapp/signals.py:
class RelatedRealtimeSignalProcessor(RealtimeSignalProcessor):
def handle_save(self, sender, instance, **kwargs):
if hasattr(instance, 'reindex_related'):
for related in instance.reindex_related:
related_obj = getattr(instance, related)
self.handle_save(related_obj.__class__, related_obj)
return super(RelatedRealtimeSignalProcessor, self).handle_save(sender, instance, **kwargs)
def handle_delete(self, sender, instance, **kwargs):
if hasattr(instance, 'reindex_related'):
for related in instance.reindex_related:
related_obj = getattr(instance, related)
self.handle_delete(related_obj.__class__, related_obj)
return super(RelatedRealtimeSignalProcessor, self).handle_delete(sender, instance, **kwargs)
In settings.py:
HAYSTACK_SIGNAL_PROCESSOR = 'myapp.signals.RelatedRealtimeSignalProcessor'
In models.py:
class Variable(models.Model):
reindex_related = ('page',)
page = models.ForeignKey(Page)
Now when a Variable is saved, the index for the related Page will also be updated.
(TODO: This doesn't work for extended relationships like foo__bar, or for many-to-many fields. But it should be straightforward to extend it to handle those if you need to.)

Resources