This question is similar to:
MarkLogic - XQuery - cts:element-range-query using variable length sequence or map
But this time I need to do the query using the queryBuilder in the node.js client API.
I have a collection of 100,000 records structured like this:
<record>
<pk>1</pk>
<id>1234</id>
</record>
<record>
<pk>2</pk>
<id>1234</id>
</record>
<record>
<pk>3</pk>
<id>5678</id>
</record>
<record>
<pk>4</pk>
<id>5678</id>
</record>
I have setup a range index on id.
I want to write a query using the queryBuilder node.js client API that will allow me to pass in an array of IDs and get out a list of records.
It needs to:
1) query a specific collection
2) leverage the range indexes for performance
Nevermind, I figured out the problem.
db.db.documents.query(
q.where(
q.collection('Records'),
q.or(
q.value('id', ['1', '2'])
)
).slice(1, 99999999)
)
I originally tried to pass an array into q.value and I was only getting limited results (Got 10 when I expected 20). So I was under the impression that I was doing it wrong.
It turns out I just needed to slice the where clause to include everything. Apparently if you don't specify how much to take it defaults to 10.
Also note that when I tried .slice(0) which would have been preferred, I got an exception.
Related
MarkLogic 9.0.8.2
In database, we have data like this
<xmldata>
<data>
<name>name1</name>
<value>E012M9876</value>
<data>
<data>
<name>name2<name>
<value>E015M6789</value>
</data>
<data>
<name>name3</name>
<value>E012M9876</value>
<data>
<data>
<name>name1<name>
<value>E015M6789</value>
</data>
</xmldata>
User can search for any operator like "=, <, <=, >=, Between" & data are dynamics, so we can't create fixed buckets, queries can be like this
name1:>=E011M1234 AND name1:<=E015M8921 (will return 2 records)
name1:>E014M8769 (will return 1 record)
name1:<=E013M7659 (will return 1 record)
name2:=E015M6789 (will return 1 record)
I looked at across to find the dynamic bucket implementation in xQuery, but didn't found any.
https://docs.marklogic.com/guide/rest-dev/search#id_69918
So can you please help on how to write code to implement this scenario?
If storing data in attributes instead of in elements, will be better approach, we can also do that.
<data>
<value name="name1">E015M6789</value>
</data>
One way to solve this problem is to create a TDE that indexes one row per data element with one column each for the name and value.
Then, an SQL or Optic query can match the appropriate rows based on boolean expressions on the value column.
Hoping that helps,
The database is in Azure cloud and not being used in production currently. There are 80.000 rows and a uprn is a VARCHAR(100);
I'm already using JOI to validate each UPRN as well;
I'm using KNEX with a SQL Server database with the following whereIn query:
knex(LOCATIONS.table).whereIn(LOCATIONS.uprn, req.body.uprns)
but this takes 8-12s to complete and sometimes timesout. if I use .toQuery() on the same thing, SSMS will return the result within 1-2.
If I do a raw query, the resulting .toQuery() or toString() works in SSMS and returns results. But if I try to use the raw directly, it will return 0 results.
I'm looking to either fix what's making whereIn so slow or get the raw query working.
EDIT 1:
After much debugging and trying -- it seems that the bug is due to how knex deals with arrays, so I made a for-of loop to add ? ? ? for each array element and then inputed the array for all params.
This led me to realizing the performance issue is due to SQL server way of parameterising.
I ended up building a raw query string with all of the parameters and validating the input with Joi string/regex config:
Joi.string()
.min(1)
.max(35)
.regex(/^[a-z\d\-_\s]+$/i)
allowing only for alphanumeric, dashes and spaces which should prevent sql injection.
I'm going to look deeper into security issues with this and might make a separate login that can only SELECT data from that table and nothing more to run with these queries.
Needed to just handle it raw and validate separately.
I have defined a model like
Class Orders(Document):
orderAmount = fields.FloatField()
cashbackAmount = fields.FloatField()
meta = {'strict': False}
I want to get all orders where (orderAmount - cashbackAmount value > 500). I am using Mongoengine and using that I want to perform this operation. I am not using Django Framework so I cannot use solutions of that.
Let's approach this if you had to do this without Mongoengine. You would start by dividing this problem into two steps
1) How to get the difference between two fields and output it as the new field?
2) How to filter all the documents based on that field's value?
You can see that it consists of several steps, so it looks like a great use case for the aggregation framework.
The first problem can be solved using addFields and subtract operators.
{$addFields: {difference: {$subtract: ["$a", "$b"]}}}
what can be translated into "for every document add a new field called difference where difference=a-b".
The second problem is a simple filtering:
{$match: {difference:{$gt: 500}}}
"give me all documents where difference field is greater than 500"
So the whole query in MongoDB would look like this
db.collectionName.aggregate([{$addFields: {difference: {$subtract: ["$a", "$b"]}}}, {$match: {difference:{$gt: 500}}}])
Now we have to translate it into Mongoengine. It turns out that there is aggregate method defined, so we can easily make small adjustments to make this query work.
Diff.objects.aggregate({"$addFields": {"difference": {"$subtract": ["$a", "$b"]}}}, {"$match": {"difference":{"$gt": 500}}})
As a result, you get CommandCursor. You can interact with that object or just convert it to the list, to get a list of dictionaries.
i am trying to filter records using key.For example
localhost:5984/school/_design/school/_view/schoolstats?startkey=[Name,DOB,AGE]&endkey=[Name,DOB,AGE]
1)filter using Name only
2)filter using Name and Age only
3)filter using Name and DOB only
4)filter using Age only
I have tried a lot using wildcard in couchdb but cant able to fetch the exact result.
If you use a ranged query with startkey and endkey, the order of keys is not changeable.
These combination should be working. If you need other combination you have to emit your multikey in the desired form.
startkey=[Name]&endkey=[Name,{}]
startkey=[Name,DOB]&endkey=[Name,{}]
startkey=[Name,DOB,AGE]&endkey=[Name,{}]
startkey=[Name,DOB]&endkey=[Name,DOB,{}]
For your asked filters, you have to emit two more keys/multikeys in your map function.
I think, if you do not change your emitting keys, aka, you have a emitting statement like: emit [Name,DOB,AGE], null
1)Filter using Name only
startkey=[Name]&endkey=[Name,{}]
2)Filter using Name and Age only
Can not
3)Filter using Name and DOB only
startkey=[Name,DOB]&endkey=[Name,DOB,{}]
4)Filter using Age only
Can not
<?xml version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
<publicationDate type="string" xmlns="http://marklogic.com/xdmp/json/basic">2015-03-30</publicationDate>
<identifier type="string" xmlns="http://marklogic.com/xdmp/json/basic">2629</identifier>
<posix type="string" xmlns="http://marklogic.com/xdmp/json/basic">nobs</posix>
</prop:properties>
I have a document with these properties above.
I want to filter by "PublicationDate" ...
I tried with "Fields" & "Field Range Indexes" and "Element Range Indexes", but I do not find the syntax (XML or JSON) to designate this property ?
is anyone know this syntax?
kind regards
In addition to the answers that give examples, please keep in mind that the element publicationDate is NOT in the namespace http://marklogic.com/xdmp/property in your example.. So your index configuration should have the namespace for the json/basic as defined per element and references to it as an xs:QName should not refer to "prop:"..
Trying to figure out if your index is correct? You can always try cts:values() from the query console and verify that your index is exactly where you expect it before using it in code.
After many trials, this is what seems to work fine (MarkLogic 8.0-3) :
Without "Field" (where wm is http://marklogic.com/xdmp/json/basic ):
qb.propertiesFragment(qb.value(qb.element(wm,'publicationDate'),'2015-03-30'))
is ok, but the following produces the same error (No element range index ...)
qb.propertiesFragment(qb.range(qb.element(wm,'publicationDate'), '>=' ,'2015-03-01'))
With "Field"
(wm:publicationDate, with wm in Path namespaces, WITHOUT /vm:properties/ before ...) the following seem to work fine :-)))
qb.propertiesFragment(qb.value(qb.field("properties_publicationDate"),'2015-03-30'))
qb.propertiesFragment(qb.range(qb.field("properties_publicationDate"), '>=' ,'2015-03-01'))
I think you are looking for cts:properties-query:
cts:properties-query(
cts:element-range-query(
xs:QName("my:publicationDate"),">",
current-dateTime() - xs:dayTimeDuration("P1D"))))
This example assumes a range index on prop:publicationDate, and also note that this assumes MarkLogic 7 or earlier. In MarkLogic 8, the name of this query appears to have changed to cts:properties-fragment-query.
In node.js, using the query builder, you could achieve something similar:
db.documents.query(
qb.where(
qb.fragmentScope('properties'),
qb.propertiesFragment(
qb.range('publicationDate', '>', ... )
)
)
)