how to query with uncertainty a design view with complex nested keys - couchdb

I'm having access on a CouchDB view wich emits documents having keys of two arrays of four integers like [[int, int, int, int], [int, int, int, int]]. In a concrete example those correspond to start-date and end-date of the document:
[[2017, 5, 5, 10], [2017, 7, 2, 11]]
Y m d H Y m d H
I'm able to get documents matching a period
request="localhorst/dbname/_design/a/_view/period"
request+="?key=\[\[2017,5,5,10\],\[2017,7,2,11\]\]"
curl -sX GET $request
Question: How to ignore the "hour" field H?
What if the boundaries are partially unknown? How to get all documents within a given period, like 2017-05-05 until 2017-07-02? In other words, how can I ignore the last columns of each boundary?
I tried to use startKey and endKey
request="localhorst/dbname/_design/a/_view/period"
request+="?startKey=\[\[2017,5,5\],\[2017,7,2\]\]"
request+="&endKey=\[\[2017,5,5,\{\}\],\[2017,7,2,\{\}\]\]"
curl -sX GET $request
This does not work since it gets documents with the correct lower bound but the upper bound is wrong, e.g.:
[[2017,4,5,10],[2017,7,2,12]] <- excluded, OK
[[2017,5,5,10],[2017,7,2,12]] <- contained, OK
[[2017,5,5,11],[2017,7,2,12]] <- contained, OK
[[2017,5,5,10],[2017,8,2,12]] <- contained, ERROR

It's not possible. If you use complex keys, you can perform partial matches with startkey and endkey by dropping items to the right of the key array only, not by dropping items internally in the key array.
Without knowing how your documents are structured, it's difficult to offer more than generic advice. I'd look to emit a single time stamp vector and use startkey and endkey to find the range, rather than trying to use the range as the key. However, this approach might not fit your model.
Otherwise, as suggested above, using Mango may be your best bet.

You can use empty for the start keys and empty object {} as the last.
So 2017-05-05 until 2017-07-02 would be
[2017,05,05] to [2017,07,02,{},{},{}]
You can refer to this answer: couchdb search or filtering on key array

Related

Map repeated values in presto

I'm extracting data from JSON and mapping two arrays in presto.It works fine when there are no repeated values in the array but fails with error - Duplicate map keys are not allowed if any of the values are repeated.I need those values and cannot remove any of the values from the array.Is there a work around for this scenario?
Sample values:
array1 -- [Rewards,NEW,Rewards,NEW]
array2 -- [losg1,losg2,losg3,losg4]
Map key/value has to be generated like this [Rewards=>losg1,NEW=>losg2,Rewards=>losg3,NEW=>losg4]
Pairs of associations can be returned like this:
SELECT ARRAY[ROW('Rewards', 'losg1'), ROW('NEW', 'losg2'), ROW('Rewards', 'losg3')]

How do we delete a batch of constraints in docplex?

I used a list of a constraint numbers or a list of constraint names but this does not work with command m.remove_constraints().
m.add(sumbs[i] <= a[i], "ct_sum_%d" %i)
A.append("ct_sum_%d" %i)
Then later on when I want to change the model: m.remove_constraints(A)
What is the correct way of doing this?
Model.add_constraints() returns the list of newly added constraints.
Similarly, Model.remove_constraints takes a collection of constraint objects, not names , not indices. For example:
cts = mdl.add_constraints(...)
mdl.remove_constraints(cts[:3]) # remove the first three

Can a comparator function be made from two conditions connected by an 'and' in python (For sorting)?

I have a list of type:
ans=[(a,[b,c]),(x,[y,z]),(p,[q,r])]
I need to sort the list by using the following condition :
if (ans[j][1][1]>ans[j+1][1][1]) or (ans[j][1][1]==ans[j+1][1][1] and ans[j][1][0]<ans[j+1][1][0]):
# do something (like swap(ans[j],ans[j+1]))
I was able to implement using bubble sort, but I want a faster sorting method.
Is there a way to sort my list using the sort() or sorted() (Using comparator or something similar) functions while pertaining to my condition ?
You can create a comparator function that retuns a tuple; tuples are compared from left to right until one of the elements is "larger" than the other. Your input/output example is quite lacking, but I believe this will result into what you want:
def my_compare(x):
return x[1][1], x[1][0]
ans.sort(key=my_compare)
# ans = sorted(ans, key=my_compare)
Essentially this will first compare the x[1][1] value of both ans[j] and ans[j+1], and if it's the same then it will compare the x[1][0] value. You can rearrange and add more comparators as you wish if this didn't match your ues case perfectly.

How do I get all hits from a cts:search() in Marklogic

I have a collection containing lots of documents.
when I search the collection, I need to get a list of matches independent of documents. So if I search for the word "pie". I would get back a list of documents, properly sorted by relevance. However, some of these documents contain the word "pie" on more then one place. I would like to get back a list of all matches, unrelated to the document where the match was found. Also, this list of all hits would need the be sorted by relevance (weight), again totally independent of the document (not grouped by the document).
Following code searches and returns matches grouped by the document...
let $searchfor := "pie"
let $query := cts:and-query((
cts:element-word-query(xs:QName("title"), ($searchfor), (), 16),
cts:element-word-query(xs:QName("para"), ($searchfor), (), 10)
))
let $resultset := cts:search(fn:collection("docs"), $query)[0 to 100]
for $n in $resultset
return cts:score($n)
What I need is $n to be the "match-node", not a "document-node"...
Thanks!
Document relevance is determined by TFIDF. Matches contribute to a document's score but don't have scores relative to each other. cts:search already returns results ordered by document relevance, so you could do this to get match nodes ordered by their ancestor document score:
let $searchfor := "pie"
let $query := cts:and-query((
cts:element-word-query(xs:QName("title"), ($searchfor), (), 16),
cts:element-word-query(xs:QName("para"), ($searchfor), (), 10)
))
return
cts:search(//(title|para),$query)[0 to 100]/cts:highlight(.,$query,element match {$cts:node})//match/*
You need to split the document (fragment it) into smaller documents. Every textnode could be a document, with an stored original xpath so that the context is not lost.
I recommend that you look at the Search API (http://community.marklogic.com/pubs/5.0/books/search-dev-guide.pdf and http://community.marklogic.com/pubs/5.0/apidocs/SearchAPI.html). This API will give what you want, providing match nodes as well as the URIs for the actual documents. You should also find it easier to use for the general cases, although there will be edge cases where you will need to revert back to cts:search.
search:search is the specific function you will want to use. It will give you back responses similar to this:
<search:response total="1" start="1" page-length="10" xmlns=""
xmlns:search="http://marklogic.com/appservices/search">
<search:result index="1" uri="/foo.xml"
path="fn:doc("/foo.xml")" score="328"
confidence="0.807121" fitness="0.901397">
<search:snippet>
<search:match path="fn:doc("/foo.xml")/foo">
<search:highlight>hello</search:highlight></search:match>
</search:snippet>
</search:result>
<search:qtext>hello sample-property-constraint:boo</search:qtext>
<search:report id="SEARCH-FLWOR">(cts:search(fn:collection(),
cts:and-query((cts:word-query("hello", ("lang=en"), 1),
cts:properties-query(cts:word-query("boo", ("lang=en"), 1))),
()), ("score-logtfidf"), 1))[1 to 10]
</search:report>
<search:metrics>
<search:query-resolution-time>PT0.647S</search:query-resolution-time>
<search:facet-resolution-time>PT0S</search:facet-resolution-time>
<search:snippet-resolution-time>PT0.002S</search:snippet-resolution-time>
<search:total-time>PT0.651S</search:total-time>
</search:metrics>
</search:response>
Here you can see that every result has one or possibly more match elements defined.
How would you determine the relevance of a word independent of the document? Relevance is a measure of document relevance, not word relevance. I don't know how one would measure word relevance.
You could potentially return all words ordered by document relevance, then words for each document in "document order" which means the order in which they appear in the document. That would be relatively easy to do with search:search where you iterate over all results and extract each matching word. What would you present with each match? Its surrounding snippet?
Keep in mind that what you're asking for would potentially take a long time to execute.

Access list element using get()

I'm trying to use get() to access a list element in R, but am getting an error.
example.list <- list()
example.list$attribute <- c("test")
get("example.list") # Works just fine
get("example.list$attribute") # breaks
## Error in get("example.list$attribute") :
## object 'example.list$attribute' not found
Any tips? I am looping over a vector of strings which identify the list names, and this would be really useful.
Here's the incantation that you are probably looking for:
get("attribute", example.list)
# [1] "test"
Or perhaps, for your situation, this:
get("attribute", eval(as.symbol("example.list")))
# [1] "test"
# Applied to your situation, as I understand it...
example.list2 <- example.list
listNames <- c("example.list", "example.list2")
sapply(listNames, function(X) get("attribute", eval(as.symbol(X))))
# example.list example.list2
# "test" "test"
Why not simply:
example.list <- list(attribute="test")
listName <- "example.list"
get(listName)$attribute
# or, if both the list name and the element name are given as arguments:
elementName <- "attribute"
get(listName)[[elementName]]
If your strings contain more than just object names, e.g. operators like here, you can evaluate them as expressions as follows:
> string <- "example.list$attribute"
> eval(parse(text = string))
[1] "test"
If your strings are all of the type "object$attribute", you could also parse them into object/attribute, so you can still get the object, then extract the attribute with [[:
> parsed <- unlist(strsplit(string, "\\$"))
> get(parsed[1])[[parsed[2]]]
[1] "test"
flodel's answer worked for my application, so I'm gonna post what I built on it, even though this is pretty uninspired. You can access each list element with a for loop, like so:
#============== List with five elements of non-uniform length ================#
example.list=
list(letters[1:5], letters[6:10], letters[11:15], letters[16:20], letters[21:26])
#===============================================================================#
#====== for loop that names and concatenates each consecutive element ========#
derp=c(); for(i in 1:length(example.list))
{derp=append(derp,eval(parse(text=example.list[i])))}
derp #Not a particularly useful application here, but it proves the point.
I'm using code like this for a function that calls certain sets of columns from a data frame by the column names. The user enters a list with elements that each represent different sets of column names (each set is a group of items belonging to one measure), and the big data frame containing all those columns. The for loop applies each consecutive list element as the set of column names for an internal function* applied only to the currently named set of columns of the big data frame. It then populates one column per loop of a matrix with the output for the subset of the big data frame that corresponds to the names in the element of the list corresponding to that loop's number. After the for loop, the function ends by outputting that matrix it produced.
Not sure if you're looking to do something similar with your list elements, but I'm happy I picked up this trick. Thanks to everyone for the ideas!
"Second example" / tangential info regarding application in graded response model factor scoring:
Here's the function I described above, just in case anyone wants to calculate graded response model factor scores* in large batches...Each column of the output matrix corresponds to an element of the list (i.e., a latent trait with ordinal indicator items specified by column name in the list element), and the rows correspond to the rows of the data frame used as input. Each row should presumably contain mutually dependent observations, as from a given individual, to whom the factor scores in the same row of the ouput matrix belong. Also, I feel I should add that if all the items in a given list element use the exact same Likert scale rating options, the graded response model may be less appropriate for factor scoring than a rating scale model (cf. http://www.rasch.org/rmt/rmt143k.htm).
'grmscores'=function(ColumnNameList,DataFrame) {require(ltm) #(Rizopoulos,2006)
x = matrix ( NA , nrow = nrow ( DataFrame ), ncol = length ( ColumnNameList ))
for(i in 1:length(ColumnNameList)) #flodel's magic featured below!#
{x[,i]=factor.scores(grm(DataFrame[, eval(parse(text= ColumnNameList[i]))]),
resp.patterns=DataFrame[,eval(parse(text= ColumnNameList[i]))])$score.dat$z1}; x}
Reference
*Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses, Journal of Statistical Software, 17(5), 1-25. URL: http://www.jstatsoft.org/v17/i05/

Resources