How to add collections in transformations when writing(creating) a Document in MarkLogic - transform

I wrote a transformation in xquery which unquotes an XML-String and inserts a element with its content. This works fine.
I need to create a collection dependant on the root element of this element as well. I can't do this on new documents as xdmp:document-add-collections() is not working. How do I add the collection to new Documents in transformations?
Here my ServerSide xQuery Code:
xquery version "1.0-ml";
module namespace transform = "http://marklogic.com/rest-api/transform/smtextdocuments";
import module namespace mem = "http://xqdev.com/in-mem-update" at '/MarkLogic/appservices/utils/in-mem-update.xqy';
declare function transform(
$context as map:map,
$params as map:map,
$content as document-node()
) as document-node()
{
let $uri := base-uri($content)
let $doccont := $content/smtextdocuments/documentcontent
let $newcont := xdmp:unquote($doccont)
let $contname := node-name($newcont/*)
let $result := if ( exists($content/smtextdocuments/content))
then mem:node-replace($content/smtextdocuments/content, <content>11{$newcont}</content>)
else mem:node-insert-after($doccont, <content>{$newcont}</content>)
let $log := xdmp:log($content)
return (
$result,
xdmp:document-add-collections($uri, fn:string($contname)),
xdmp:document-remove-collections($uri, "raw")
)
};
The script ist running with the java api (4.0.4) create methode via parameter ServerTransform transform. As per documentation the transformation script is running before the document is stored in the Database.
Its a new document; I need to transform the content and then create the collection.
I can see the document after the create, the content is available. Just the collection is missing. I can try xdmp:document-insert method but is it correct writing the document while create is running?.

The transform mechanism of the Java API / REST API takes responsibility for the document write. At present, there's no way for the transform to supply collections to the writer. That would be a reasonable request for enhancement.
The transform shouldn't attempt to write the document, because the writer would also attempt to write the same document.
One alternative would be to transform the document in Java before writing it and specify the collection as part of the write request.
Another alternative would be to rewrite the transform as a resource service extension, implement the write within the resource service extension, and modify the Java client to send the document to the resource service extension.
Depending on the model, a final alternative might be to use a range index on an element within the document to collect documents into sets instead of using a collection on the document.
Hoping that helps,

What do you mean by "new documents"? Is the document already inserted into the MarkLogic database at the time you are adjusting the collections of it? If not, you may want to modify your return to ($result, xdmp:document-insert($uri, $result, xdmp:default-permissions(), fn:string($contname)) ) for that case.
Otherwise, can you edit your question to share the error or problem more specifically you are facing?

It is a pity that REST transforms do not allow this, like MLCP transforms do. Until changed you have the options drawn by ehennum, or you can consider delaying adding of collections to a pre- or post-commit trigger. It takes some overhead, but it sometimes makes perfect sense to do something like that in a trigger, since it makes sure it is always enforced, and a good place to do content validation, audit logging, and things like that as well.
HTH!

Related

How to use MarkLogic search options by name

I'm using the ML9 Java API to upload a search options file to the DB with a name that I can use later in my search call. I would now like to write an xquery transform to highlight the query-matches in a set of elements in the response. Standard snippets won't work for me since they only bring back the fields in which there are matches and because they may not bring back the complete field value, but only the immediate context of the match.
So I want to use the cts:highlight function in a custom transform and want to pass to it the name of the options that I have uploaded into the DB. My question is how I can best get the options element from the DB using the name passed in to the transform method. I want to use this to construct the cts:query that I can pass in to the cts:highlight call as in:
let $query := cts:query(search:parse($query-string, $options))
let $result := cts:highlight($doc, $query, <markup>{$cts:text}
</markup>)
I was thinking I could pass in the query-string and the name of the pre-loaded options and use these to construct the cts:query, but don't know how to get the options from the name.
I found a way to avoid having to read the options. Setting the option 'return-query' to true adds a search:query node to the search:response which is passed to the transform method as the document-node. I'm then able to get this directly in the transform method to use in cts:highlight as:
let $query := cts:query($response/search:response/search:query/*[1])
The options are stored in the modules database associated with your REST instance. You could theoretically dig them out, though that would be relying on an implementation detail (the URI).
You might look into a combination of extract-document-data, as Sam mentioned, plus a search result transform, rather than the heavier approach of doing your own search through what I'd guess is a read transform.
Another alternative might be a custom snippeter that you pull into your options via transform-results. See http://docs.marklogic.com/guide/search-dev/query-options#id_58295.

Passing sets of properties and nodes as a POST statement wit KOA-NEO4J or BOLT

I am building a REST API which connects to a NEO4J instance. I am using the koa-neo4j library as the basis (https://github.com/assister-ai/koa-neo4j-starter-kit). I am a beginner at all these technologies but thanks to some help from this forum I have the basic functionality working. For example the below code allows me to create a new node with the label "metric" and set the name and dateAdded propertis.
URL:
/metric?metricName=Test&dateAdded=2/21/2017
index.js
app.defineAPI({
method: 'POST',
route: '/api/v1/imm/metric',
cypherQueryFile: './src/api/v1/imm/metric/createMetric.cyp'
});
createMetric.cyp"
CREATE (n:metric {
name: $metricName,
dateAdded: $dateAdded
})
return ID(n) as id
However, I am struggling to know how I can approach more complicated examples. How can I handle situations when I don't know how many properties will be added when creating a new node beforehand or when I want to create multiple nodes in a single post statement. Ideally I would like to be able to pass something like JSON as part of the POST which would contain all of the nodes, labels and properties that I want to create. Is something like this possible? I tried using the below Cypher query and passing a JSON string in the POST body but it didn't work.
UNWIND $props AS properties
CREATE (n:metric)
SET n = properties
RETURN n
Would I be better off switching tothe Neo4j Rest API instead of the BOLT protocol and the KOA-NEO4J framework. From my research I thought it was better to use BOLT but I want to have a Rest API as the middle layer between my front and back end so I am willing to change over if this will be easier in the longer term.
Thanks for the help!
Your Cypher syntax is bad in a couple of ways.
UNWIND only accepts a collection as its argument, not a string.
SET n = properties is only legal if properties is a map, not a string.
This query should work for creating a single node (assuming that $props is a map containing all the properties you want to store with the newly created node):
CREATE (n:metric $props)
RETURN n
If you want to create multiple nodes, then this query (essentially the same as yours) should work (but only if $prop_collection is a collection of maps):
UNWIND $prop_collection AS props
CREATE (n:metric)
SET n = props
RETURN n
I too have faced difficulties when trying to pass complex types as arguments to neo4j, this has to do with type conversions between js and cypher over bolt and there is not much one could do except for filing an issue in the official neo4j JavaScript driver repo. koa-neo4j uses the official driver under the hood.
One way to go about such scenarios in koa-neo4j is using JavaScript to manipulate the arguments before sending to Cypher:
https://github.com/assister-ai/koa-neo4j#preprocess-lifecycle
Also possible to further manipulate the results of a Cypher query using postProcess lifecycle hook:
https://github.com/assister-ai/koa-neo4j#postprocess-lifecycle

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}
Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Incremental loading in Azure Mobile Services

Given the following code:
listView.ItemsSource =
App.azureClient.GetTable<SomeTable>().ToIncrementalLoadingCollection();
We get incremental loading without further changes.
But what if we modify the read.js server side script to e.g. use mssql to query another table instead. What happens to the incremental loading? I'm assuming it breaks; if so, what's needed to support it again?
And what if the query used the untyped version instead, e.g.
App.azureClient.GetTable("SomeTable").ReadAsync(...)
Could incremental loading be somehow supported in this case, or must it be done "by hand" somehow?
Bonus points for insights on how Azure Mobile Services implements incremental loading between the server and the client.
The incremental loading collection works by sending the $top and $skip query parameters (those are also sent when you do a query by using the .Take and .Skip methods in the table). So if you want to modify the read script to do something other than the default behavior, while still maintaining the ability to use that table with an incremental loading collection, you need to take those values into account.
To do that, you can ask for the query components, which will contain the values, as shown below:
function read(query, user, request) {
var queryComponents = query.getComponents();
console.log('query components: ', queryComponents); // useful to see all information
var top = queryComponents.take;
var skip = queryComponents.skip;
// do whatever you want with those values, then call request.respond(...)
}
The way it's implemented at the client is by using a class which implements the ISupportIncrementalLoading interface. You can see it (and the full source code for the client SDKs) in the GitHub repository, or more specifically the MobileServiceIncrementalLoadingCollection class (the method is added as an extension in the MobileServiceIncrementalLoadingCollectionExtensions class).
And the untyped table does not have that method - as you can see in the extension class, it's only added to the typed version of the table.

Query in Override of node.tpl

I have override a node.tpl and need some results from db using a query generated by views.
Here is the code which i used:
<?php $res = db_query("SELECT node.nid AS nid, node.title AS node_title FROM node node LEFT JOIN content_field_is_popular node_data_field_is_popular ON node.vid
= node_data_field_is_popular.vid WHERE (node.type in ('article_thisweekend')) AND (UPPER(node_data_field_is_popular.field_is_popular_value)
= UPPER('yes'));");
foreach($res as $reco){
print ($reco->nid);
}
?>
But I am not getting any results.
What I am missing?
Thanks
Matt V. has good advice in that you should try to separate the view templates from the sql query logic.
For this specific example though, you need to use db_fetch_object since $res just contains the
database query result resource
Instead of
foreach($res as $reco){
print ($reco->nid);
}
Do
while ($reco = db_fetch_object($res)){
print ($reco->nid);
}
It's generally best to avoid putting queries directly in your template files. It's best to separate logic and presentation.
Instead, use a module to generate the content you need and pass that along to the theme layer. In this case, if you're already using the Views module to generate the query, let Views run it for you and pass off the data to a page or block display.
Otherwise, to debug the query, try running the query independent of the code, through something like phpMyAdmin or "drush sqlq".

Resources