Marklogic : "highlight" seem not work withe Node.js and QueryBuilder

Marklogic : "highlight" seem not work withe Node.js and QueryBuilder - node.js

I try to get an extract of the text with the searched words highlighted on a JSON collection.
My search syntax is:
qb.word(qb.field('doc_text'),vartxt)
With 'doc_text' declared as field
(field type: root, include root: false, includes: doc_text), in an Node.js application.
The search works well, and it is done well on this field ...
But in txt[0].results[kl].matches[0]['match-text'], I find the first 3 properties of the JSON,
and not an extract from 'doc_text' with the words found.
I have another application in which the highlights work correctly, but it is based on XML.
Did I forget something in the field declaration, or is the operation different between JSON
and XML data, or is the highlight system not running on JSON via Node.js and QueryBuilder ?
Kind regards

Fields do not work quite the same way in XML and JSON. I think you're running into this limitation:
http://docs.marklogic.com/guide/app-dev/json#id_24090
The value of a field in XML can be the concatenation of all the text nodes, but the same does not apply in JSON.

I think I understood !!!
This query gives a correct snippet, with the extract and the words highlighted :
mkcq.and(mkcq.collection('document'), mkcq.word(mkcq.field('doc_text'), 'connaitre'))
On the other hand, this query gives the first 3 fields of the JSON :
mkcq.and(mkcq.collection('document'), mkcq.word(mkcq.field('doc_text'), 'connaitre'), mkcq.value(mkcq.element('','doc_user'), 'mbp'))
I do not know if this is normal or not, but it should be able to be corrected either by a simplified query and a selection on the returned records, evening by a particular snippeter.
Kind regards

Related

How can I query channel by checking if a field contains string on GetStream-IO

I would like to know how can I query fields on GetStream.IO API using something like regex or contain syntax for matching strings
Example
I would like to query channels that name contains xyZ not case sensitive, instead of equally matching the value, I've been trying to use the below, but no success
{ name: {$regex : ".*xyz.*"}}
Thanks in advance

Please note. The query syntax information is now available here:
https://getstream.io/chat/docs/javascript/query_syntax_operators/
More info about running channel queries themselves:
https://getstream.io/chat/docs/javascript/query_channels/

Only a subset of mongo style queries are supported, as mentioned in https://getstream.io/chat/docs/#query_syntax , so your use case is not really supported with these kind of queries. It might be a solution for you to use custom data on your channel and query for the exact values. You can do that when you initialize (https://getstream.io/chat/docs/#initialize_channel) or update the channel.

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

I am using Marklogic's search:search() functions to handle searching within my application, and I have a use case where users need to be able to perform a text search that returns matches from an attribute on my document.
For example, using this document:
<document attr="foo attribute value">Some child content</document>
I want users to be able to perform a text search (not using constraints) for "foo", and to return my document based on the match within the attribute #attr. Is there some way to configure the query options to allow this?
Typing in attr:"foo" is not a workable solution, so using attribute range constraints won't help, and users still need to be able to search for other child content not in the attribute node. I'm thinking perhaps there is a way to add a cts:query OR'd into the search via the options, that allows this attribute to be searched?
Open to any and all other solutions.
Thanks!
Edit:
Some additional information, to help clarify:
I need to be able to find matches within the attribute, and elsewhere within the content. Using the example above, searches for "foo", "child content", or "foo child content" should all return my document as a result. This means that any query options that are AND'd with the search (like <additional-query>, which is intended to help constrain your search and not expand it) won't work. What I'm looking for is (likely) an additional query option that will be OR'd with the original search, so as to allow searching by child node content, attribute content, or a mix of the two.
In other words, I'd like MarkLogic to treat any attribute node content exactly the same as element text nodes, as far as searching is concerned.
Thanks!!

You could accomplish this search with a serialized element-attribute-word cts query in the additional-query options for the search API. The element attribute word query will use the universal index to match individual tokens within attributes.
In MarkLogic 9 You may be able to use the following to perform your search:
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("",
<options xmlns="http://marklogic.com/appservices/search">
<additional-query>
<cts:element-attribute-word-query xmlns:cts="http://marklogic.com/cts">
<cts:element>document</cts:element>
<cts:attribute>attr</cts:attribute>
<cts:text>foo</cts:text>
</cts:element-attribute-word-query>
</additional-query>
</options>
)

MarkLogic has ways to parse query text and map a value to an attribute word or value query.
First, you can use cts:parse():
http://docs.marklogic.com/guide/search-dev/cts_query#id_71878
http://docs.marklogic.com/cts.parse
Second, you can use search:search() with constraints defined in an XML payload:
http://docs.marklogic.com/guide/search-dev/query-options#id_39116
http://docs.marklogic.com/guide/search-dev/appendixa#id_36346

I'd look into using the <default> option of <term>. For details see http://docs.marklogic.com/guide/search-dev/appendixa#id_31590
Alternatively, consider doing query expansion. The idea behind that is that a end user send a search string. You parse it using search:parse of cts:parse (as suggested by Erik), and instead of submitting that query as-is to MarkLogic, you process the cts:query tree, to look for terms you want to adjust, or expand. Typically used to automatically blend in synonyms, related terms, or translations, but could be used to copy individual terms, and automatically add queries on attributes for those.
HTH!

querieng document which doesn't have a given field or is empty string in Solr

I am doing a query with solr where I need to find documents without a given field say 'name' and I am trying following part;
$q=+status:active -name:["" TO *]'
But it sends both all the documents with and without that field.
Can anyone help me figure this out?
the field name is a normal String type and is indexed.
I am using nodejs. Can anyone help me with this

According to docs:
-field:[* TO *] finds all documents without a value for field
Update
I tried it but it sends even the ones with the field non empty
Then my wild quess is that you are using search query q instead of using filter query fq. Since you are using multiple statements in query I assume that q does some extra magic to get the most relevant documents for you, which can lead to returning some non-wanted results.
If you want to get the strict set of results you should use filter query fq instead, see docs.

CAML Query with Contains and Or Clause Issue

What I want to achieve : Take a keyword array as input and query Sharepoint List to return all rows which contain the keywords in the list.
I have built a simple CAML query to query my list with one keyword (pdf) .
<Query><Where><Contains><FieldRef Name='Keyword'/><Value Type='Text'>pdf</Value></Contains></Where></Query>
This works fine.
But, when I try to use Or clause in the CAML query(see below), I get the following error
"One or more field types are not installed properly. Go to the list settings page to delete these fields."
<Query><Where><Or><Contains><FieldRef Name='Keyword'/><Value Type='Text'>pdf</Value></Contains></Or></Where></Query>
I googled for the syntax and everything looks good. Please let me know what is missing.
Thanks in advance.

In CAML Query if you want to use OR you must and should have 2 conditions.

The field reference name must be the internal name. You can find this by going to the colmn page in list/library settings and the name is the end of the URL. Spaces and underscores in the name must be handled differently.

WildcardQuery error in Solr

I use solr to search for documents and when trying to search for documents using this query "id:*", I get this query parser exception telling that it cannot parse the query with * or ? as the first character.
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
type Status report
message org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery
description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:*': '*' or '?' not allowed as first character in WildcardQuery).
Is there any patch for getting this to work with just * ? Or is it very costly to do such a query?

If you want all documents, do a query on *:*
If you want all documents with a certain field (e.g. id) try id:[* TO *]

Lucene doesn't allow you to start WildcardQueries with an asterisk by default, because those are incredibly expensive queries and will be very, very, very slow on large indexes.
If you're using the Lucene QueryParser, call setAllowLeadingWildcard(true) on it to enable it.
If you want all of the documents with a certain field set, you are much better off querying or walking the index programmatically than using QueryParser. You should really only use QueryParser to parse user input.

id:[a* TO z*] id:[0* TO 9*] etc.
I just did this in lukeall on my index and it worked, therefore it should work in Solr which uses the standard query parser. I don't actually use Solr.
In base Lucene there's a fine reason for why you'd never query for every document, it's because to query for a document you must use a new indexReader("DirectoryName") and apply a query to it. Therefore you could totally skip applying a query to it and use the indexReader methods numDocs() to get a count of all the documents, and document(int n) to retrieve any of the documents.

If you are just trying to get all documents, Solr does support the *:* query. It's the only time I know of that Solr will let you begin a query with an *. I'm sure you've probably seen this as the default query in the Solr admin page.
If you are trying to do a more specific query with an * as the first character, like say id:*456 then one of the best ways I've seen is to index that field twice. Once normally (field name: id), and once with all the characters reversed (field name: reverse_id). Then you could essentially do the query id:456 by sending the query reverse_id:654 instead. Hope that makes sense.
You can also search the Solr user group mailing list at http://www.mail-archive.com/solr-user#lucene.apache.org/ where questions like this come up quite often.

The following Solr issue is a request to be able to configure the default lucene query parser.
https://issues.apache.org/jira/browse/SOLR-218
In this issue you can find the following description how to 'patch' Solr. This modification would allow you to start queries with a *.
Jonas Salk: I've basically updated only one Java file: SolrQueryParser.java.
public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
...
public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) {
...
setAllowLeadingWildcard(true);
setLowercaseExpandedTerms(true);
...
}
I'm not sure if setLowercaseExpandedTerms is needed...

I'm assuming with id:* you're just trying to match all documents, right?
I've never used solr before, but in my Lucene experience, when ingesting data, we've added a hidden field to every document, then when we need to return every record we do a search for the string constant in that field that's the same for every record.
If you can't add a field like that in your situation, you could use a RegexQuery with a regex that would match anything that could be found in the id field.
Edit: actually answering the question. I've never heard of a patch to get that to work, but I would be surprised if it could even be made to work reasonably well. See this question for a reason why unconstrained PrefixQuery's can cause a problem.

Actually, I have been using a workaround for this. I append a character to the id, eg: A1, A2, etc.
With such values in the field, it is possible to search using the query id:A*
But would love to find whether a true solution exists.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Marklogic : "highlight" seem not work withe Node.js and QueryBuilder - node.js

Fields do not work quite the same way in XML and JSON. I think you're running into this limitation: http://docs.marklogic.com/guide/app-dev/json#id_24090 The value of a field in XML can be the concatenation of all the text nodes, but the same does not apply in JSON.

Related

How can I query channel by checking if a field contains string on GetStream-IO

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

querieng document which doesn't have a given field or is empty string in Solr

CAML Query with Contains and Or Clause Issue

WildcardQuery error in Solr

Categories

Resources