Bucket Search & Dynamic queries - search

MarkLogic 9.0.8.2
In database, we have data like this
<xmldata>
<data>
<name>name1</name>
<value>E012M9876</value>
<data>
<data>
<name>name2<name>
<value>E015M6789</value>
</data>
<data>
<name>name3</name>
<value>E012M9876</value>
<data>
<data>
<name>name1<name>
<value>E015M6789</value>
</data>
</xmldata>
User can search for any operator like "=, <, <=, >=, Between" & data are dynamics, so we can't create fixed buckets, queries can be like this
name1:>=E011M1234 AND name1:<=E015M8921 (will return 2 records)
name1:>E014M8769 (will return 1 record)
name1:<=E013M7659 (will return 1 record)
name2:=E015M6789 (will return 1 record)
I looked at across to find the dynamic bucket implementation in xQuery, but didn't found any.
https://docs.marklogic.com/guide/rest-dev/search#id_69918
So can you please help on how to write code to implement this scenario?
If storing data in attributes instead of in elements, will be better approach, we can also do that.
<data>
<value name="name1">E015M6789</value>
</data>

One way to solve this problem is to create a TDE that indexes one row per data element with one column each for the name and value.
Then, an SQL or Optic query can match the appropriate rows based on boolean expressions on the value column.
Hoping that helps,

Related

Marklogic faceted search and collations

I'm setting up a faceted search in MarkLogic. I have the following range indexes configured:
That is, I have two indexes. The first is on namespace http://www.corbas.co.uk/ns/presentations and local name keyword. The second has the local name level. The collation URI for both is http://marklogic.com/collation/en/S1.
When I try to search using the following I see errors related to collations:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("levels:Intermediate",
<options xmlns="http://marklogic.com/appservices/search">
<return-results>true</return-results>
<return-facets>true</return-facets>
<constraint name="keywords" facet="true">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<element ns="http://www.corbas.co.uk/ns/presentations" name="keyword"/>
</range>
</constraint>
<constraint name="levels" facet="true">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<element ns="http://www.corbas.co.uk/ns/presentations" name="level"/>
</range>
</constraint>
</options>)
I get the following error:
XDMP-ELEMRIDXNOTFOUND: cts:search(fn:collection(),
cts:element-range query(fn:QName("http://www.corbas.co.uk/ns/presentations","level"),
"=", "Intermediate", ("collation=http://marklogic.com/collation/en/S1"), 1),
("score-logtfidf", "faceted", cts:score-order("descending")),
xs:double("1"), ()) -- No string element range index for
{http://www.corbas.co.uk/ns/presentations}level
collation=http://marklogic.com/collation/en/S1
What am I doing wrong?
Strange Message. If it even got that far, then it looks like your database default collation is changed. Does not answer the question. just strange.
Forst off, I would always add the collation to the constraint:
<search:range type="xs:string" facet="true"
collation="http://marklogic.com/collation/en/S1">
Second, I always troubleshoot range index issue from the query console:
use cts:values() to verify that your indexes are in place and in the namespace and collation you expect. This removes other layers and verifies that the index is as you expect.
And another item: MarkLogic range indexes do not exist until content is indexed. Are you sure you have not turned off auto-index on the database and perhaps content is not indexed? That would give you an error.
To be honest, I would have expected a different error message. I would have expected MarkLogic to complain it couldn't find an index for root collation, because you have not added collation attributes on the range elements in the search options.
Maybe adding those will help.
HTH!
It looks to me like your configuration is correct, which suggests to me that the problem is timing. Once you specify what indexes you want, MarkLogic gets to work creating them. If you run a query that requires those indexes before MarkLogic finishes creating them, you get this error. Depending on the amount of content you have, the creation process can be very quick or take hours.
To check the status, point your browser to the Admin UI (http://localhost:8001) and navigate to the configuration page for your database. Click on the Status tab and look for "Reindexing/Refragmenting State"—if MarkLogic is still reindexing, it will tell you so here and you'll get updates on its progress. (You can also get this information through the Management API.)

Sourcing data from DocumentDB in Hadoop

I have a hadoop application that source data from two different DocumentDB collection. However, the json schema of documents belonging to these two collections are different. Both has a field showing time, but one is called TimeStamp and the other one is called UpdatedOn. I'd like to know how I can specify a query which is based on this time field and retrive only those json documents satisfying the condition in my query. I specify my query like below
String query = "SELECT * FROM c WHERE c.Timestamp > " + timestamp;
conf.set(ConfigurationUtil.QUERY, query);
This query applies on one of the collection. I need a query like below
"SELECT * FROM collection1 as c1, collection2 as c2 WHERE c1.Timestamp > x1 OR c2.UpdatedOn > x1"
Is this supported in DocumentDB?
This is not supported since it is not documented, your best bet is two execute these two queries and then merge the results using Linq or any other technique to get one result set.
Hope this helps.

marklogic, howto create range on document properties

<?xml version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
<publicationDate type="string" xmlns="http://marklogic.com/xdmp/json/basic">2015-03-30</publicationDate>
<identifier type="string" xmlns="http://marklogic.com/xdmp/json/basic">2629</identifier>
<posix type="string" xmlns="http://marklogic.com/xdmp/json/basic">nobs</posix>
</prop:properties>
I have a document with these properties above.
I want to filter by "PublicationDate" ...
I tried with "Fields" & "Field Range Indexes" and "Element Range Indexes", but I do not find the syntax (XML or JSON) to designate this property ?
is anyone know this syntax?
kind regards
In addition to the answers that give examples, please keep in mind that the element publicationDate is NOT in the namespace http://marklogic.com/xdmp/property in your example.. So your index configuration should have the namespace for the json/basic as defined per element and references to it as an xs:QName should not refer to "prop:"..
Trying to figure out if your index is correct? You can always try cts:values() from the query console and verify that your index is exactly where you expect it before using it in code.
After many trials, this is what seems to work fine (MarkLogic 8.0-3) :
Without "Field" (where wm is http://marklogic.com/xdmp/json/basic ):
qb.propertiesFragment(qb.value(qb.element(wm,'publicationDate'),'2015-03-30'))
is ok, but the following produces the same error (No element range index ...)
qb.propertiesFragment(qb.range(qb.element(wm,'publicationDate'), '>=' ,'2015-03-01'))
With "Field"
(wm:publicationDate, with wm in Path namespaces, WITHOUT /vm:properties/ before ...) the following seem to work fine :-)))
qb.propertiesFragment(qb.value(qb.field("properties_publicationDate"),'2015-03-30'))
qb.propertiesFragment(qb.range(qb.field("properties_publicationDate"), '>=' ,'2015-03-01'))
I think you are looking for cts:properties-query:
cts:properties-query(
cts:element-range-query(
xs:QName("my:publicationDate"),">",
current-dateTime() - xs:dayTimeDuration("P1D"))))
This example assumes a range index on prop:publicationDate, and also note that this assumes MarkLogic 7 or earlier. In MarkLogic 8, the name of this query appears to have changed to cts:properties-fragment-query.
In node.js, using the query builder, you could achieve something similar:
db.documents.query(
qb.where(
qb.fragmentScope('properties'),
qb.propertiesFragment(
qb.range('publicationDate', '>', ... )
)
)
)

Search Documents from two collections in MarkLogic

In Marklogic, I want to search between two collections by joining the id element of doc from collection1 to id element of doc from collection2. When it is matched i need the resulting document from both collections.
I have the below code, but it is very slow. How to use cts:search or search:search to achieve the same
for $i in collection('demographic')/individual,
$j in collection('membership')/membership[enrolleIndividualId/id/text() = $i/individual/id/text()])
return {$i,$j}
Update:
I should note that your sample is not valid XQuery: return element root { $i, $j } would be valid. Also, you should not use the /text() node selector, as it's behavior can be counterintuitive. You can compare elements directly in an XPath predicate ([enrolleIndividualId/id eq $i/individual/id]). Use /fn:string() in place of /text() if you need the contents of an element as a string. I'd also recommend using the atomic equality operator eq in place of the sequence equality operator = when directly comparing individual elements.
Original Answer:
There are several approaches to implementing joins in MarkLogic, but I would first question your data model. From the names of the elements in your sample query, it looks like you are using a relational model (individuals have memberships). MarkLogic is a document database, and it's optimized for denormalized documents. You will be much better served to process your data and generate new individual documents that each contain the relevant membership data.
That being said, here's how you could join your documents:
First, you will need range indices to write performant joins. If the id element from your sample query is not unique to individuals, you will need path range indices on enrolledIndividualId/id and individual/id, otherwise, a simple element range index on id will do.
The most common join pattern in MarkLogic uses a "shotgun-OR" query; first retrieving values from the lexicon backing a range index, and then constructing an or-query from those values to retrieve the relevant documents. This won't work directly in your case, as you want to retrieve both sides of the join. You can either run a search for each pair of documents, or run a single search for one side, and then an additional document read for each document.
pairs:
for $value in cts:values(cts:path-reference("individual/id"))
return
cts:search(/,
cts:or-query((
cts:and-query((
cts:collection-query("demographic"),
cts:path-range-query("individual/id", "=", $value))),
cts:and-query((
cts:collection-query("membership"),
cts:path-range-query("enrolledIndividualId/id", "=", $value))))),
"unfiltered")
shotgun-OR plus iteration:
for $doc in
cts:search(/,
cts:and-query((
cts:collection-query("demographic"),
cts:path-range-query("individual/id", "=",
cts:values(cts:path-reference("individual/id"))))),
"unfiltered")
return
cts:search(/,
cts:and-query((
cts:collection-query("membership"),
cts:path-range-query("enrolledIndividualId/id", "=", $doc/individual/id))),
"unfiltered")
As you can see, each approach requires I/O proportionate to the number of docs/values you want to join. If you only needed the shotgun-OR (ie, a query for documents based on criteria from other documents), you would only need to make two requests, the initial cts:values() call to retrieve values from a lexicon, and the cts:search() call using a query built from those values.
Note: the cts:query objects used in these examples could be used in conjunction with the Search API by means of the search:resolve() function.
Given your apparent data model, you will be much better served by processing your data into individual, de-normalized documents.

MarkLogic - node.js Client API - queryBuilder query array of IDs

This question is similar to:
MarkLogic - XQuery - cts:element-range-query using variable length sequence or map
But this time I need to do the query using the queryBuilder in the node.js client API.
I have a collection of 100,000 records structured like this:
<record>
<pk>1</pk>
<id>1234</id>
</record>
<record>
<pk>2</pk>
<id>1234</id>
</record>
<record>
<pk>3</pk>
<id>5678</id>
</record>
<record>
<pk>4</pk>
<id>5678</id>
</record>
I have setup a range index on id.
I want to write a query using the queryBuilder node.js client API that will allow me to pass in an array of IDs and get out a list of records.
It needs to:
1) query a specific collection
2) leverage the range indexes for performance
Nevermind, I figured out the problem.
db.db.documents.query(
q.where(
q.collection('Records'),
q.or(
q.value('id', ['1', '2'])
)
).slice(1, 99999999)
)
I originally tried to pass an array into q.value and I was only getting limited results (Got 10 when I expected 20). So I was under the impression that I was doing it wrong.
It turns out I just needed to slice the where clause to include everything. Apparently if you don't specify how much to take it defaults to 10.
Also note that when I tried .slice(0) which would have been preferred, I got an exception.

Resources