Text Searching and Text Indexing for nested fields in MongoDB - node.js

I have a document structure that looks something like this
{
name:
hobbies:[
{ tag: "food", description: "eating"},
{ tag: "soccer", description: "PL"}
]
}
Is it possible to achieve Text Indexing only on the tag subfield, so that I can attempt text searches with only the tag subfield being checked?
Currently I'm trying but it definitely ends up checking the description tag.
db.users.createIndex({"hobbies" : "text"})
Thanks for your time.

So I was able to get past this using multikey indexes which basically allows us to create an index for every element of the array. And in my case, I used multikey indexes on array fields that contain nested objects which worked liked this
db.inventory.createIndex( { "hobbies.tag": "text" } )
You can read more about this from the docs here MongoDB Multikey Index Docs

Related

Usage of TSVECTOR and to_tsquery to filter records in Sequelize

I've been trying to get full search text to work for a while now without any success. The current documentation has this example:
[Op.match]: Sequelize.fn('to_tsquery', 'fat & rat') // match text search for strings 'fat' and 'rat' (PG only)
So I've built the following query:
Title.findAll({
where: {
keywords: {
[Op.match]: Sequelize.fn('to_tsquery', 'test')
}
}
})
And keywords is defined as a TSVECTOR field.
keywords: {
type: DataTypes.TSVECTOR,
},
It seems like it's generating the query properly, but I'm not getting the expected results. This is the query that it's being generated by Sequelize:
Executing (default): SELECT "id" FROM "Tests" AS "Test" WHERE "Test"."keywords" ## to_tsquery('test');
And I know that there are multiple records in the database that have 'test' in their vector, such as the following one:
{
"id": 3,
"keywords": "'keyword' 'this' 'test' 'is' 'a'",
}
so I'm unsure as to what's going on. What would be the proper way to search for matches based on a TSVECTOR field?
It's funny, but these days I am also working on the same thing and getting the same problem.
I think part of the solution is here (How to implement PostgresQL tsvector for full-text search using Sequelize?), but I haven't been able to get it to work yet.
If you find examples, I'm interested. Otherwise as soon as I find the solution that works 100% I will update this answer.
What I also notice is when I add data (seeds) from sequelize, it doesn't add the lexemes number after the data of the field in question. Do you have the same behavior ?
last thing, did you create the index ?
CREATE INDEX tsv_idx ON data USING gin(column);

How to make $elemMatch work for json array data in mango query?

I have a field in my application like below.
{
"Ct": "HH",
Val:{
"Count":"A",
"Branch":"A"
}
}
When I'm trying to retrieve this using below command in CouchDB, I'm unable to retrieve records.
{
"selector" : {
"Val":{
"$elemMatch":{
"Count":"A"
}
}
}
From the CouchDB documentation,$elemMatch[1]
Matches and returns all documents that contain an array field with at
least one element that matches all the specified query criteria.
Val.Count is not an array field so $elemMatch is not appropriate.
Consider the CouchDB documentation regarding subfield queries[2]:
1.3.6.1.3. Subfields
A more complex selector enables you to specify the values for field of
nested objects, or subfields. For example, you might use a standard
JSON structure for specifying a field and subfield.
Example of a field and subfield selector, using a standard JSON
structure:
{
"imdb": {
"rating": 8
}
}
An abbreviated equivalent uses a dot notation to combine the field and
subfield names into a single name.
{
"imdb.rating": 8
}
Specifically,
selector: {
"Val.Count": "A"
}
1 CouchDB: 1.3.6.1.7. Combination Operators
2 CouchDB: 1.3.6.1.3. Subfields

Cloudant Search: what are the conditions for using the count facet?

I am trying to set up a search index using Cloudant, but I find the documentation pretty confusing. It states:
FACETING
In order to use facets, all the documents in the index must include all the fields that have faceting enabled. If your documents do not include all the fields, you will receive a bad_request error with the following reason, “dim field_name does not exist.”
If each document does not contain all the fields for facets, it is recommended that you create separate indexes for each field. If you do not create separate indexes for each field, you must include only documents that contain all the fields. Verify that the fields exist in each document using a single if statement.
Counts
The count facet syntax takes a list of fields, and returns the number of query results for each unique value of each named field.
The count operation works only if the indexed values are strings. The indexed values cannot be mixed types. For example, if 100 strings are indexed, and one number, then the index cannot be used for count operations. You can check the type using the typeof operator, and convert using parseInt, parseFloat and .toString() functions.
Specifically, what does it means when "all the documents in the index include all the fields that have faceting enabled".
For example, if my database consists of the following doc:
{
"_id": "mydoc"
"subjects": [ "subject A", "subject B" ]
}
And I write a search index like so:
function (doc) {
for(var i=0; i < doc.subjects.length; i++)
index("hasSubject", doc.subjects[i], {facet: true});
}
Would this be illegal because mydoc doesn't have a field called hasSubject? And when we rewrite the query to look like;
{
"_id": "mydoc"
"hasSubject": true,
"subjects": [ "subject A", "subject B" ]
}
Would that suddenly make it OK...?
So the new documentation is at https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#faceting ; however, the entry on faceting is the same. So no big deal there.
To answer your question though, I think what the documentation is saying is that all the JSON docs in your database must contain the subjects field, which is what you're declaring you want to facet on in your example.
So I would also consider defining your search index like so:
function (doc) {
if (doc.subjects) {
for(var i=0; i < doc.subjects.length; i++) {
if (typeof doc.subjects[i] == "string") {
index("hasSubject", doc.subjects[i], {facet: true});
}
}
}
}
And if you had a doc like this in your database:
{
"_id": "mydoc"
"hasSubject": true,
}
I think that would suddenly make your facets NOT ok.

MongoDB find field that is part of a longer string

I know how to search for a field that contains a part of my search in MongoDB & Node, or even if it is possible ie.
Record:
{
name: "Hello my name is robinson"
}
Query:
{
name: /robinson/i
}
However I don't know how to do the reverse
ie:
Query:
{
name: "Hello my name is robinson"
}
Record:
{
name: "robinson"
}
I am trying to make rules to categorise strings based on their content. Any help is much appreciated. Content may not always be broken down into words, otherwise I could have just done a split by space and searched for each one.
With a Text index you should be able to find documents from a phrase text search.
http://docs.mongodb.org/manual/reference/operator/query/text/#match-any-of-the-search-terms
If the search string is a space-delimited string, $text operator performs a logical OR search on each term and returns documents that contains any of the terms.
In your example, you create an index in the "name" field of your collection:
db.collection.createIndex( { name: "text" } )
Then you can query with the $text operator:
db.collection.find({$text: { $search: "Hello my name is robinson"}})
As stated in the docs, the query returns documents that contains "Hello or my or name or is or robinson".

need guidance on node.js multilingual presentation

I am new to node (v0.10) stack.
I am trying to achieve the following:
I have (hopefully) multilingual articles in the latest MongoDB such as:
_id
...more fields...
text: [
{lang: 'en', title: 'some title', body: 'body', slug: 'slug'},
....
]
Everytime I display an article in specific language I query as follows:
var query = Model.findOne({'text.slug': slug});
query.exec(function(err, doc){
async.each(doc.text, function(item, callback){
if (item.lang == articleLang) {
//populate the article to display
}
});
res.render('view', {post:articleToDisplay});
});
Slug is unique for each language!
The problem I have is that mongo will return the whole doc with all subdocs and not just the subdoc I searched for. Now I have to choose to iterate over all subdocs and display the appropriate one on client side or use async.each on the server to get the subdoc I need and only send to the views that one. I am doing it with async on the server. Is that OK? Also async iterates asynchronously but node still waits for the whole loop to finish and then renders the view. Am I missing anything thinking that the user is actually blocked until the async.each finishes? I am still trying to wrap my head around this asynchronous execution. Is there a way I can possibly improve how I manage this code? It seems to be quite standard procedure with subdocs!
Thanks in advance for all your help.
To achieve what you want, you need to make use of the aggregation pipeline. Using a simple findOne() would not be of help,
since you would then have to redact sub documents in your code rather than allowing mongodb to do it. find() and findOne() return the whole document when
a document matches the search criteria.
In the aggregation pipleline you could use the $unwind and $match operators to achieve this.
$unwind:
Deconstructs an array field from the input documents to output a
document for each element. Each output document is the input document
with the value of the array field replaced by the element.
First unwind the document based on the text values array.
$match:
Filters the documents to pass only the documents that match the
specified condition(s) to the next pipeline stage.
Then use the $match operator to match the appropriate documents.
db.Model.aggregate([
{$unwind:"$text"},
{$match:{"text.slug":slug,"text.lang":articleLang}}
])
Doing this would return you only one document with its text field containing only one object. such as: (Note that the text field in the output is not an array)
{ "_id" : ... ,.., "text" : { "slug" : "slug", "lang" : "en" ,...} }

Resources