How to use MarkLogic search options by name - search

I'm using the ML9 Java API to upload a search options file to the DB with a name that I can use later in my search call. I would now like to write an xquery transform to highlight the query-matches in a set of elements in the response. Standard snippets won't work for me since they only bring back the fields in which there are matches and because they may not bring back the complete field value, but only the immediate context of the match.
So I want to use the cts:highlight function in a custom transform and want to pass to it the name of the options that I have uploaded into the DB. My question is how I can best get the options element from the DB using the name passed in to the transform method. I want to use this to construct the cts:query that I can pass in to the cts:highlight call as in:
let $query := cts:query(search:parse($query-string, $options))
let $result := cts:highlight($doc, $query, <markup>{$cts:text}
</markup>)
I was thinking I could pass in the query-string and the name of the pre-loaded options and use these to construct the cts:query, but don't know how to get the options from the name.

I found a way to avoid having to read the options. Setting the option 'return-query' to true adds a search:query node to the search:response which is passed to the transform method as the document-node. I'm then able to get this directly in the transform method to use in cts:highlight as:
let $query := cts:query($response/search:response/search:query/*[1])

The options are stored in the modules database associated with your REST instance. You could theoretically dig them out, though that would be relying on an implementation detail (the URI).
You might look into a combination of extract-document-data, as Sam mentioned, plus a search result transform, rather than the heavier approach of doing your own search through what I'd guess is a read transform.
Another alternative might be a custom snippeter that you pull into your options via transform-results. See http://docs.marklogic.com/guide/search-dev/query-options#id_58295.

Related

Is there a way to pass a parameter to google bigquery to be used in their "IN" function

I'm currently writing an app that accesses google bigquery via their "#google-cloud/bigquery": "^2.0.6" library. In one of my queries I have a where clause where i need to pass a list of ids. If I use UNNEST like in their example and pass an array of strings, it works fine.
https://cloud.google.com/bigquery/docs/parameterized-queries
However, I have found that UNNEST can be really slow and just want to use IN on its own and pass in a string list of ids. No matter what format of string list I send, the query returns null results. I think this is because of the way they convert parameters in order to avoid sql injection. I have to use a parameter because I, myself want to avoid SQL injection attacks on my app. If i pass just one id it works fine, but if i pass a list it blows up so I figure it has something to do with formatting, but I know my format is correct in terms of what IN would normally expect i.e. IN ('', '')
Has anyone been able to just pass a param to IN and have it work? i.e. IN (#idParam)?
We declare params like this at the beginning of the script:
DECLARE var_country_ids ARRAY<INT64> DEFAULT [1,2,3];
and use like this:
WHERE if(var_country_ids is not null,p.country_id IN UNNEST(var_country_ids),true) AND ...
as you see we let NULL and array notation as well. We don't see issues with speed.

How to add collections in transformations when writing(creating) a Document in MarkLogic

I wrote a transformation in xquery which unquotes an XML-String and inserts a element with its content. This works fine.
I need to create a collection dependant on the root element of this element as well. I can't do this on new documents as xdmp:document-add-collections() is not working. How do I add the collection to new Documents in transformations?
Here my ServerSide xQuery Code:
xquery version "1.0-ml";
module namespace transform = "http://marklogic.com/rest-api/transform/smtextdocuments";
import module namespace mem = "http://xqdev.com/in-mem-update" at '/MarkLogic/appservices/utils/in-mem-update.xqy';
declare function transform(
$context as map:map,
$params as map:map,
$content as document-node()
) as document-node()
{
let $uri := base-uri($content)
let $doccont := $content/smtextdocuments/documentcontent
let $newcont := xdmp:unquote($doccont)
let $contname := node-name($newcont/*)
let $result := if ( exists($content/smtextdocuments/content))
then mem:node-replace($content/smtextdocuments/content, <content>11{$newcont}</content>)
else mem:node-insert-after($doccont, <content>{$newcont}</content>)
let $log := xdmp:log($content)
return (
$result,
xdmp:document-add-collections($uri, fn:string($contname)),
xdmp:document-remove-collections($uri, "raw")
)
};
The script ist running with the java api (4.0.4) create methode via parameter ServerTransform transform. As per documentation the transformation script is running before the document is stored in the Database.
Its a new document; I need to transform the content and then create the collection.
I can see the document after the create, the content is available. Just the collection is missing. I can try xdmp:document-insert method but is it correct writing the document while create is running?.
The transform mechanism of the Java API / REST API takes responsibility for the document write. At present, there's no way for the transform to supply collections to the writer. That would be a reasonable request for enhancement.
The transform shouldn't attempt to write the document, because the writer would also attempt to write the same document.
One alternative would be to transform the document in Java before writing it and specify the collection as part of the write request.
Another alternative would be to rewrite the transform as a resource service extension, implement the write within the resource service extension, and modify the Java client to send the document to the resource service extension.
Depending on the model, a final alternative might be to use a range index on an element within the document to collect documents into sets instead of using a collection on the document.
Hoping that helps,
What do you mean by "new documents"? Is the document already inserted into the MarkLogic database at the time you are adjusting the collections of it? If not, you may want to modify your return to ($result, xdmp:document-insert($uri, $result, xdmp:default-permissions(), fn:string($contname)) ) for that case.
Otherwise, can you edit your question to share the error or problem more specifically you are facing?
It is a pity that REST transforms do not allow this, like MLCP transforms do. Until changed you have the options drawn by ehennum, or you can consider delaying adding of collections to a pre- or post-commit trigger. It takes some overhead, but it sometimes makes perfect sense to do something like that in a trigger, since it makes sure it is always enforced, and a good place to do content validation, audit logging, and things like that as well.
HTH!

AEM Query builder exclude a folder in search

I need to create a query where the params are like:
queryParams.put("path", "/content/myFolder");
queryParams.put("1_property", "myProperty");
queryParams.put("1_property.operation", "exists");
queryParams.put("p.limit", "-1");
But, I need to exclude a certain path inside this blanket folder , say: "/content/myFolder/wrongFolder" and search in all other folders (whose number keeps on varying)
Is there a way to do so ? I didn't find it exactly online.
I also tried the unequals operation as the parent path is being saved in a JCR property, but still no luck. I actually need unlike to avoid all occurrences of the path. But there is no such thing:
path=/main/path/to/search/in
group.1_property=cq:parentPath
group.1_property.operation=unequals
group.1_property.value=/path/to/be/avoided
group.2_property=myProperty
group.2_property.operation=exists
group.p.or=true
p.limit=-1
This is an old question but the reason you got more results later lies in the way in which you have constructed your query. The correct way to write a query like this would be something like:
path=/main/path/where
property=myProperty
property.operation=exists
property.value=true
group.p.or=true
group.p.not=true
group.1_path=/main/path/where/first/you/donot/want/to/search
group.2_path=/main/path/where/second/you/donot/want/to/search
p.limit=-1
A couple of notes: your group.p.or in your last comment would have applied to all of your groups because they weren't delineated by a group number. If you want an OR to be applied to a specific group (but not all groups), you would use:
path=/main/path/where
group.1_property=myProperty
group.1_property.operation=exists
group.1_property.value=true
2_group.p.or=true
2_group.p.not=true
2_group.3_path=/main/path/where/first/you/donot/want/to/search
2_group.4_path=/main/path/where/second/you/donot/want/to/search
Also, the numbers themselves don't matter - they don't have to be sequential, as long as property predicate numbers aren't reused, which will cause an exception to be thrown when the QB tries to parse it. But for readability and general convention, they're usually presented that way.
I presume that your example was just thrown together for this question, but obviously your "do not search" paths would have to be children of the main path you want to search or including them in the query would be superfluous, the query would not be searching them anyway otherwise.
AEM Query Builder Documentation for 6.3
Hope this helps someone in the future.
Using QueryBuilder you can execute:
map.put("group.p.not",true)
map.put("group.1_path","/first/path/where/you/donot/want/to/search")
map.put("group.2_path","/second/path/where/you/donot/want/to/search")
Also I've checked PredicateGroup's class API and they provide a setNegated method. I've never used it myself, but I think you can negate a group and combine it into a common predicate with the path you are searching on like:
final PredicateGroup doNotSearchGroup = new PredicateGroup();
doNotSearchGroup.setNegated(true);
doNotSearchGroup.add(new Predicate("path").set("path", "/path/where/you/donot/want/to/search"));
final PredicateGroup combinedPredicate = new PredicateGroup();
combinedPredicate.add(new Predicate("path").set("path", "/path/where/you/want/to/search"));
combinedPredicate.add(doNotSearchGroup);
final Query query = queryBuilder.createQuery(combinedPredicate);
Here is the query to specify operator on given specific group id.
path=/content/course/
type=cq:Page
p.limit=-1
1_property=jcr:content/event
group.1_group.1_group.daterange.lowerBound=2019-12-26T13:39:19.358Z
group.1_group.1_group.daterange.property=jcr:content/xyz
group.1_group.2_group.daterange.upperBound=2019-12-26T13:39:19.358Z
group.1_group.2_group.daterange.property=jcr:content/abc
group.1_group.3_group.relativedaterange.property=jcr:content/courseStartDate
group.1_group.3_group.relativedaterange.lowerBound=0
group.1_group.2_group.p.not=true
group.1_group.1_group.p.not=true

When to use 'app.params' and 'req.params'?

Since, I can get parameters from both the methods using a code similar to the one below:
req.params.<PARAM NAME> in single/many separate app.METHOD function(s)
(think this may result in code repetition)
&
app.params(<ARRAY>,<CALLBACK>) function, independent of the app.METHOD functions, and called if the URL contains any parameter (:id, :name .etc)
What are the use-cases to apply one over the other?
My best guess would be is using app.params for parameter validation or some sort of preprocessing. For example the express docs provide and example where you attach req.user information to the request using app.params and after that you can work directly with the user information instead of processing the parameter again. Using req.params would be more specific in terms of processing the specific query. For example I'd use req.params for a REST endpoint which should perform an operation by id (update/delete) as in general there shouldn't be any additional preprocessing involder.

Best practice to pass query conditions in ajax request

I'm writing a REST api in node js that will execute a sql query and send the results;
in the request I need to send the WHERE conditions; ex:
GET 127.0.0.1:5007/users //gets the list of users
GET 127.0.0.1:5007/users
id = 1 //gets the user with id 1
Right now the conditions are passed from the client to the rest api in the request's headers.
In the API I'm using sequelize, an ORM that needs to receive WHERE conditions in a particular form (an object); ex: having the condition:
(x=1 AND (y=2 OR z=3)) OR (x=3 AND y=1)
this needs to be formatted as a nested object:
-- x=1
-- AND -| -- y=2
| -- OR ----|
| -- z=3
-- OR -|
|
| -- x=3
-- AND -|
-- y=1
so the object would be:
Sequelize.or (
Sequelize.and (
{x=1},
Sequelize.or(
{y=2},
{z=3}
)
),
Sequelize.and (
{x=3},
{y=1}
)
)
Now I'm trying to pass a simple string (like "(x=1 AND (y=2 OR z=3)) OR (x=3 AND y=1)"), but then I will need a function on the server that can convert the string in the needed object (this method in my opinion has the advantage that the developer writing the client, can pass the where conditions in a simple way, like using sql, and this method is also indipendent from the used ORM, with no need to change the client if we need to change the server or use a different ORM);
The function to read and convert the conditions' string into an object is giving me headache (I'm trying to write one without success, so if you have some examples about how to do something like this...)
What I would like to get is a route capable of executing almost any kind of sql query and give the results:
now I have a different route for everything:
127.0.0.1:5007/users //to get all users
127.0.0.1:5007/users/1 //to get a single user
127.0.0.1:5007/lastusers //to get user registered in the last month
and so on for the other tables i need to query (one route for every kind of request I need in the client);
instead I would like to have only one route, something like:
127.0.0.1:5007/request
(when calling this route I will pass the table name and the conditions' string)
Do you think this solution would be a good solution or you generally use other ways to handle this kind of things?
Do you have any idea on how to write a function to convert the conditions' string into the desired object?
Any suggestion would be appreciated ;)
I would strongly advise you not to expose any part of your database model to your clients. Doing so means you can't change anything you expose without the risk of breaking the clients. One suggestion as far as what you've supplied is that you can and should use query parameters to cut down on the number of endpoints you've got.
GET /users //to get all users
GET /users?registeredInPastDays=30 //to get user registered in the last month
GET /users/1 //to get a single user
Obviously "registeredInPastDays" should be renamed to something less clumsy .. it's just an example.
As far as the conditions string, there ought to be plenty of parsers available online. The grammar looks very straightforward.
IMHO the main disadvantage of your solution is that you are creating just another API for quering data. Why create sthm from scratch if it is already created? You should use existing mature query API and focus on your business logic rather then inventing sthm new.
For example, you can take query syntax from Odata. Many people have been developing that standard for a long time. They have already considered different use cases and obstacles for query API.
Resources are located with a URI. You can use or mix three ways to address them:
Hierarchically with a sequence of path segments:
/users/john/posts/4711
Non hierarchically with query parameters:
/users/john/posts?minVotes=10&minViews=1000&tags=java
With matrix parameters which affect only one path segment:
/users;country=ukraine/posts
This is normally sufficient enough but it has limitations like the maximum length. In your case a problem is that you can't easily describe and and or conjunctions with query parameters. But you can use a custom or standard query syntax. For instance if you want to find all cars or vehicles from Ford except the Capri with a price between $10000 and $20000 Google uses the search parameter
q=cars+OR+vehicles+%22ford%22+-capri+%2410000..%2420000
(the %22 is a escaped ", the %24 a escaped $).
If this does not work for your case and you want to pass data outside of the URI the format is just a matter of your taste. Adding a custom header like X-Filter may be a valid approach. I would tend to use a POST. Although you just want to query data this is still RESTful if you treat your request as the creation of a search result resource:
POST /search HTTP/1.1
your query-data
Your server should return the newly created resource in the Location header:
HTTP/1.1 201 Created
Location: /search/3
The result can still be cached and you can bookmark it or send the link. The downside is that you need an additional POST.

Resources