Elasticsearch query based on result of initial query - search

I have an interesting problem, I have an Elasticsearch query which brings back whatever results, however, some of these results have have other results associated with them, like comments associated with a forum post.
If a forum post is matched by my query, I want it to also match the associated comments, which all have a parent_id of the original forum post.
Is that possible?

In case anybody comes across this in the future, here is the answer!
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#parent-child-inner-hits

Related

MongoDB countDocuments vs exists vs find vs findOne, which is the fastest way to check if a doc exists?

I am facing different opinions at different blogs and answers!
Please forgive me in advance if it seems to be stupid to ask but your help will be appreciated!
This issue seems still unclear to me, there are too many ways to check if a document exists but i am not sure which may be the fastest way possible after so many updates mongodb has released till today in 2020.
find({ <PREDICATE> }).limit(1) and findOne({ <PREDICATE> }) will result in same amount of time. As the query optimiser will evaluate it the same way.
Fastest queries in MongoDB are covered queries. That is the queries covered by an index and there's no FETCH stage. If you simply want to check the existence of the document. This will be the fastest query:
db.users.find({_id : 10021 )}, {_id: 1})
Above query relies on the _id index (which will always be there) and projecting only _id so it doesn't go to disk at all.
count without predicate is not always accurate, as it relies on metadata.
If you want exact count, you should do it in an aggregation pipeline.

Mongo Schema for Quiz Site

I'm building a small Node/Mongo app that serves users with up to 3 questions per day. Users can only answer yes or no and the correct answer will be determined at a later time (these questions are closer to predictions). Currently, I have these documents:
User
id
Question
id
QuestionAnswer
id
question_id (ref)
UserAnswer
id
question_id (ref)
user_id (ref)
What is the most efficient way to query the db so I get today's questions but also check whether that user has answered that question already? I feel like I'm overthinking it. I've tried a couple ways that seem to be overkill.
It's good to put them all in one schema since we don't have joins in mongodb.
It is faster than using relations.
Also for keeping your query small, take a look at this.
You should stay away from relations till you have a good reason for using them. So, what you need is only one schema.

Doctrine2: Limiting with Left Joins / Pagination - Best Practice

i have a big query (in my query builder) and a lot of left joins. So i get Articles with their comments and tags and so on.
Let's say i have the following dql:
$dql = 'SELECT blogpost, comment, tags
FROM BlogPost blogpost
LEFT JOIN blogpost.comments comments
LEFT JOIN blogpost.tags tags';
Now let's say my database has more than 100 blogposts but i only want the first 10, but with all the comments of those 10 and all their tags, if they exist.
If i use setMaxResults it limits the Rows. So i might get the first two Posts, but the last one of those is missing some of it's comments or tags. So the followin doesn't work.
$result = $em->createQuery($dql)->setMaxResults(15)->getResult();
Using the barely documented Pagination Solution that ships with doctrine2.2 doesn't really work for me either since it is so slow, i could as well load all the data.
I tried the Solutions in the Stackoverflow Article, but even that Article is still missing a Best Practise and the presented Solution is deadly slow.
Isn't there a best practise on how to do this?
Is nobody using Doctrine2.2 in Production mode?
Getting the proper results with a query like this is problematic. There is a tutorial on the Doctrine website explaining this problem.
Pagination
The tutorial is more about pagination rather than getting the top 5 results, but the overall idea is that you need to do a "SELECT DISTINCT a.id FROM articles a ... LIMIT 5" instead of a normal SELECT. It's a little more complicated than this, but the last 2 points in that tutorial should put you on the right track.
Update:
The problem here is not Doctrine, or any other ORM. The problem lies squarely on the database being able to return the results you're asking for. This is just how joins work.
If you do an EXPLAIN on the query, it will give you a more in depth answer of what is happening. It would be a good idea to add the results of that to your initial question.
Building on what is discussed in the Pagination article, it would appear that you need at least 2 queries to get your desired results. Adding DISTINCT to a query has the potential to dramatically slow down your query, but its only really needed if you have joins in it. You could write another query that just retrieves the first 10 posts ordered by created date, without the joins. Once you have the IDs of those 10 posts, do another query with your joins, and a WHERE blogpost.id IN (...) ORDER BY blogpost.created. This method should be much more efficient.
SELECT
bp
FROM
Blogpost bp
ORDER BY
bp.created DESC
LIMIT 10
Since all you care about in the first query are the IDs, you could set Doctrine to use Scalar Hydration.
SELECT
bg
FROM
Blogpost bp
LEFT JOIN
bp.comments c
LEFT JOIN
bp.tags t
WHERE
bp.id IN (...)
ORDER BY
bp.created DESC
You could also probably do it in one query using a correlated subquery. The myth that subqueries are always bad is NOT true. Sometimes they are faster than joins. You will need to experiment to find out what the best solution is for you.
Edit in light of the clarified question:
You can do what you want in native MySQL using a subquery in the FROM clause as such:
SELECT * FROM
(SELECT * FROM articles ORDER BY date LIMIT 5) AS limited_articles,
comments,
tags
WHERE
limited_articles.article_id=comments.article_id
limited_articles.article_id=tags.article_id
As far as I know, DQL does not support subqueries like this, so you can use the NativeQuery class.

How to store votes for CouchDB document?

I am looking for a good example how to store votes in a document.
For example if we have a document which is post and users can vote for it.
If I store the vote in a field in the document, for example:
votes : 12345
What will happen if the author is editing the post and during this time someone votes? The author is not going to be able to save, because somebody voted and document will be with new revision.
The other option is to store votes separately, each vote to be document, or to create a document with votes for every post?
If I decide to store every vote in a different document, how difficult it's going to be to aggregate this data? Or I have to calculate it each time when I show the document?
What are your solutions?
regards
This will result in a conflict. There's a chapter in the CouchDB Guide about handling conflicts.
http://guide.couchdb.org/draft/conflicts.html
If you use a middleware (such as PHP) it can recognize and handle the conflict. (see wiki for example code: http://wiki.apache.org/couchdb/Replication_and_conflicts)
If you want to offer a pure CouchApp it should be possible to use update handlers to manage some common conflict cases automatically. http://wiki.apache.org/couchdb/Document_Update_Handlers
If it works I would prefer to store the votes in the document. But I did not try any of these approaches for myself yet. So I would be happy If you share your solution.
I found this article to be very helpful on the subject of how to avoid conflicts when many users will be updating a document, such as voting or adding comments to a blog post.
http://www.cmlenz.net/archives/2007/10/couchdb-joins
The third and best(?) solution was store each comment as a separate document with a link to the blog post. Using complex keys made it very easy to query for all comments belonging to a post as well as querying for all comments made by a user, even sorted in chronological order.

Complex search query in lucene (querying fields which are indexed as numeric, analyzed or not-analyzed using a simple analyzer)

Hi I am building a search application using lucene. Some of my queries are complex. For example, My documents contain the fields location and population where location is a not-analyzed field and population is a numeric field. Now I need to return all the documents that have location as "san-francisco" and population between 10000 and 20000. If I combine these two fields and build a query like this:
location:san-francisco AND population:[10000 TO 20000], i am not getting the correct result. Any suggestions on why this could be happening and what I can do.
Also while building complex queries some of the fields that I am including are analyzed while others are not analyzed. For instance the location field is not analyzed and contains terms like chicago, san-francisco and so on. While the summary field is analyzed and it generally contains a descriptive paragraph.
Consider this query:
location:san-francisco AND summary:"great restaurants"
Now if I use a StandardAnalyzer while searching I do not get the correct results when the location field contains a term like san-francisco or los-angeles (i.e it cannot handle the hyphen in between) but if I use a keyword analyzer for the query I do not get correct results either because it cannot search for the phrase "great restaurants" in the summary field.
First, I would recommend tackling this one problem at a time. From my reading of your post, it sounds like you have multiple issues:
You're unsure why a particular query
is not returning any results.
You're unsure why some fields are not being analyzed.
You're having problems with the built-in analyzers dealing with
hyphens.
That's how your post reads. If that's correct, I would suggest you post each question separately. You'll get better answers if the question is precise. It's overwhelming trying to answer your question in the current format.
Now, let me take a stab in the dark at some of your problems:
For your first problem, if you're getting into really complex queries in Lucene, ask yourself whether it makes sense to be doing these queries here, rather than in a proper database. For a more generic answer, I'd try isolating the problem by removing parts of the query until you get results back. Once you find out what part of the query is causing no results, we can debug that further.
For the second problem, check the document you're adding to Lucene. Lucene provides options to store data but not index it. Make sure you've got the right option specified when adding fields to the document.
For the third problem, if the built-in analyzers don't work out for you, breaking on hyphens, just build your own analyzer. I ran into a similar issue with the '#' symbol, and to solve the problem, I wrote a custom analyzer that dealt with it properly. You could do the same for hyphens.
You should use PerFieldAnalyzerWrapper. As the name suggests, you can use different analyzers for different field. In this case, you can use KeywordAnalyzer for city name and StandardAnalyzer for text.

Resources