Searching through related indices on Algolia - search

I'm trying find out if there's performant way to search through my current data structures, or if I have to restructure them.
I have the following structure for my indices:
Publication (attributes: id, title, keywords)
PublicationFile (attributes: id, publication_id, text, page_number)
A publication has many publication files, publication file contains the contents of the file and the page it was found in (text and page_number).
title, keywords, and text are the searchable attributes, so if someone searches for 'economy' I want to search through both my indices.
I would like to perform a search that searches through both indices and returns the results in a manner that allows me to do something like this:
Publication1
keyword1 keyword2
Found results in Publication1's file contents in: [file a (pages: 1, 2, 3), file b (pages: 5)]
So I kind of want the search that happens to return results grouped by a publication's ID. The only way I can think of right now is to search both indices and then loop through the results and link the file/page matches to a publication.
In summary my questions are:
Is there a way I can structure my data to avoid the nested loops to process it?
Is there a way I can do this through Algolia without having to modify my structure? I would ideally want to re-use Algolia's frontend searching code and avoid processing this data on my backend.

To answer your questions:
1) Yes, I'll get into more details below
2) No unfortunately not, you'll have to modify your data structure.
Here is how I'd recommend you structure your data to achieve what you're trying to do.
{
objectID: "publicationFieIdId",
publicationId: '',
title: '',
keywords: ['', ''],
text: "",
page_number: 1,
published_at: 1485892992 // timestamp
}
Essentially you need to flatten your 2 indices into a single one to achieve what you're trying to do. Modifying the data structure is going to be less headache down the road than maintaining that client side code. and perform better too.
Few articles or documentation links that could be useful to explain why:
https://blog.algolia.com/inside-the-engine-part-7-better-relevance-via-dedup-at-query-time/
https://www.algolia.com/doc/guides/search/distinct/
Hope this helps!
Maxime

Related

ArangoSearch: how to search without specifying the document field?

I'm looking into ArangoSearch for the first time and it looks like a pretty good functionality.
However, in all the tutorials, despite having the ability to tell it to index all fields, one cannot do a 'blind' search across all fields of the document. Like when we look at the example below:
FOR d in myView SEARCH d.text IN ["quick", "brown"] RETURN d
I don't seem to have the ability to just search d entirely without specifying each individual field that I want to include in my search. Is that correct and if so, why is that and are there workarounds? I'm dealing with a lot of different collections with a lot of different fields that can contain a relevant term, it would be a shame if I'd have to tabulate all of them to make an expansive search.

Is list function a good candidate for my scenario?

I have a view in couchDb that is defined like this
function (doc) {
if (doc.url) {
var a = new Date(doc.postedOn);
emit([a.toLocaleDateString(), doc.count, doc.userId], {
_id: doc.userId,
postTitle: doc.postTitle,
postSummary: doc.postSummary,
url: doc.url,
count: doc.count
});
}
};
This gives me the result in a format that I want.Sorted first by date then by count and then by userID.
However I have trouble querying it.What I want is to query this view just by userId.That is leave the date and the count parameter null.
_view/viewName?limit=20&descending=true&endkey=["","","userId"]
does not give me the desired result.
Should I be using list function to filter out the results of the view.Is there any impact on performance if I do this?
This quote from the definitive guide first gave me the idea that list functions could be used to filter and aggregate results.
The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.
List function has nothing to do with your case. From the docs you've linked to yourself:
While Show functions are used to customize document presentation, List functions are used for same purpose, but against View functions results.
Show functions are used to represent documents in various formats, commonly as HTML page with nicer formatting. They can also be used to run server-side functions without requiring a pre-existing document.
To solve your problem just change the order of the emitted keys, putting userId first, i.e.:
[ doc.userId, a.toLocaleDateString(), doc.count ]
and update your query appropriately.
If changing the order of emitted keys is not an option, just create another view.

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

CouchDB view collation, join on one key, search on other values

Looking at the example described in Couch DB Joins.
It discusses view collation and how you can have one document for your blog posts, and then each comment is a separate document in CouchDB. So for example, I could have "My Post" and 5 comments associated with "My Post" for a total of 6 documents. In their example, "myslug" is stored both in the post document, and each comment document, so that when I search CouchDB with the key "myslug" it returns all the documents.
Here's the problem/question. Let's say I want to search on the author in the comments and a post that also has a category of "news". How would this work exactly?
So for example:
function(doc) {
if (doc.type == "post") {
emit([doc._id, 0], doc);
} else if (doc.type == "comment") {
emit([doc.post, 1], doc);
}
}
That will load my blog post and comments based on this: ?startkey=["myslug"]
However, I want to do this, grab the comments by author bob, and the post that has the category news. For this example, bob has written three comments to the blog post with the category news. It seems as if CouchDB only allows me search on keys that exist in both documents, and not search on a key in one document, and a key in another that are "joined" together with the map function.
In other words, if post and comments are joined by a slug, how do I search on one field in one document and another field in another document that are joined by the id aka. slug?
In SQL it would be something like this:
SELECT * FROM comments JOIN doc.id ON doc.post WHERE author = bob AND category = news
I've been investigating couchdb for about a week so I'm hardly qualified to answer your question, but I think I've come to the conclusion it can't be done. View results need to be tied to one and only one document so the view can be updated. You are going to have to denormalize, at least if you don't want to do a grunt search. If anyone's come up with a clever way to do this I'd really like to know.
There are several ways that you can approximate a SQL join on CouchDB. I've just asked a similar question here: Why is CouchDB's reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)
You can use MapReduce (not a good option)
You can use lists (This will iterate over a result set before emitting results, meaning you can 'combine' documents in a number of creative ways)
You can also apparently use 'collation', though I haven't figured this out yet (seems like I always get a count and can only use the feature with Reduce - if I'm on the right track)

filtering results in solr

I'm trying to build auto suggest functionality using Solr. The index contains different locations within a city and looks something like
id: unique id
name: the complete name
type: can be one of 'location_zone', 'location_subzone', 'location_city', 'outlet', 'landmark' ...
city: city id
now when the user types something, I want it to return suggestion only from the current city and of type location_*. something similar to WHERE city_id = 1 AND type="location_%" in SQL.
I guess one way to do it is by faceting but is that the right way? will it still search in all documents and then filter the results or will it apply the condition first as mysql would do it
PS: I'm new to solr and would appreciate if you can point out any mistakes in the approach
Solr does provide filtering, using the fq parameter. What you're looking for should be something along the lines of:
&fq=city_id:1&fq=type:location_*&q=...
This page illustrates very well how and when to use filter queries in Solr.

Resources