How to search records in all directory except one in Marklogic - search

I am trying to search XML's in my database using marklogic-java api with following restrictions :
1) A particular XML tag must contain my given value for ex : tradeId must equal to what I pass
2) Results must lie in collections provided by me
3) Results can lie anywhere in marklogic database except in one particular directory
I don't have the solution to point 3. Few of the records meeting my first two criteria have a URI starting with /TRADES/* and so i want to search everywhere except under directory "/TRADES".
This is how my code looks :
public static List<DocumentPage> getResults() throws KeyManagementException, NoSuchAlgorithmException,
KeyStoreException, CertificateException, IOException {
String tradeId = "XYZ";
DatabaseClient client = getDBClient();
QueryManager queryMgr = client.newQueryManager();
XMLDocumentManager xmlMngr = client.newXMLDocumentManager();
StringHandle rawHandle = new StringHandle();
rawHandle.withFormat(Format.XML).set(getQueryToFetchMessagesByTradeId(tradeId));
RawQueryByExampleDefinition querydef = queryMgr.newRawQueryByExampleDefinition(rawHandle);
querydef.setCollections("/messages/processed", /messages/toBeProcessed");
querydef.setDirectory("/");
return getDocumentPageList(querydef, client);
}
private static String getQueryToFetchMessagesByTradeId(String tradeId) {
String query = "<q:qbe xmlns:q=\"http://marklogic.com/appservices/querybyexample\">\n" + "<q:query>\n<q:word>"
+ "<tradeId tradeIdScheme=" + "\"http://www.abcd.com/internalid/trade-id\"" + ">" + tradeId
+ "</tradeId></q:word>\n" + "</q:query>\n" + "</q:qbe>";
return query;
}
Any help is highly appreciated.

I don't see an option to compose metadata (directory) queries in query by example syntax. So you'll have to use Structured Queries instead. I find them more readable in Java anyway since they don't require string concatenation. Use StructuredQueryBuilder like so:
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef =
qb.and(
qb.containerQuery(
qb.element(new QName("http://www.abcd.com/internalid/trade-id", "tradeId")),
qb.and(
qb.term( tradeId ),
qb.value(
qb.elementAttribute(
qb.element(new QName("http://www.abcd.com/internalid/trade-id", "tradeId")),
qb.attribute("tradeIdScheme")
),
"http://www.abcd.com/internalid/trade-id"
)
)
),
qb.not(
qb.directory(1, "/TRADES/")
)
);
querydef.setCollections("/messages/processed", "/messages/toBeProcessed")
return getDocumentPageList(querydef, client);

Related

How to get a list of all documents that contain the search string in their document id in Firestore's collection? [duplicate]

Is it possible to query a firestore collection to get all document that starts with a specific string?
I have gone through the documentation but do not find any suitable query for this.
You can but it's tricky. You need to search for documents greater than or equal to the string you want and less than a successor key.
For example, to find documents containing a field 'foo' staring with 'bar' you would query:
db.collection(c)
.where('foo', '>=', 'bar')
.where('foo', '<', 'bas');
This is actually a technique we use in the client implementation for scanning collections of documents matching a path. Our successor key computation is called by a scanner which is looking for all keys starting with the current user id.
same as answered by Gil Gilbert.
Just an enhancement and some sample code.
use String.fromCharCode and String.charCodeAt
var strSearch = "start with text here";
var strlength = strSearch.length;
var strFrontCode = strSearch.slice(0, strlength-1);
var strEndCode = strSearch.slice(strlength-1, strSearch.length);
var startcode = strSearch;
var endcode= strFrontCode + String.fromCharCode(strEndCode.charCodeAt(0) + 1);
then filter code like below.
db.collection(c)
.where('foo', '>=', startcode)
.where('foo', '<', endcode);
Works on any Language and any Unicode.
Warning: all search criteria in firestore is CASE SENSITIVE.
Extending the previous answers with a shorter version:
const text = 'start with text here';
const end = text.replace(/.$/, c => String.fromCharCode(c.charCodeAt(0) + 1));
query
.where('stringField', '>=', text)
.where('stringField', '<', end);
IRL example
async function search(startsWith = '') {
let query = firestore.collection(COLLECTION.CLIENTS);
if (startsWith) {
const end = startsWith.replace(
/.$/, c => String.fromCharCode(c.charCodeAt(0) + 1),
);
query = query
.where('firstName', '>=', startsWith)
.where('firstName', '<', end);
}
const result = await query
.orderBy('firstName')
.get();
return result;
}
If you got here looking for a Dart/Flutter version
Credit to the java answer by Kyo
final strFrontCode = term.substring(0, term.length - 1);
final strEndCode = term.characters.last;
final limit =
strFrontCode + String.fromCharCode(strEndCode.codeUnitAt(0) + 1);
final snap = await FirebaseFirestore.instance
.collection('someCollection')
.where('someField', isGreaterThanOrEqualTo: term)
.where('someField', isLessThan: limit)
.get();
I found this, which works perfectly for startsWith
const q = query(
collection(firebaseApp.db, 'capturedPhotos'),
where('name', '>=', name),
where('name', '<=', name + '\uf8ff')
)
The above are correct! Just wanted to give an updated answer!
var end = s[s.length-1]
val newEnding = ++end
var newString = s
newString.dropLast(1)
newString += newEnding
query
.whereGreaterThanOrEqualTo(key, s)
.whereLessThan(key, newString)
.get()

How to deal with SQL Injection in OrientDB using nodejs?

I'm using the orientjs library to perform operations in the Orient Database. I read in the documentation that it's possible to use parameter-style queries like the following:
db.query(
'SELECT name, ba FROM Player '
+ 'WHERE ba >= :ba AND team = ":team"',
{params: {
ba: targetBA,
team: targetTeam }
}, limit: 20
).then(function(hitters){
console.log(hitters)
});
My question is: Is it enough to prevent SQL injection? Because I didn't find information about that in the NodeJS API. In the case of Java, there is a 'Prepared Query' concept, I'm not sure if they are refering to the same thing.
Seems to be secure, I'm trying with this code (yours taken from the wiki is a bit buggy):
var name='admin';
db.open().then(function() {
return db.query(
"SELECT * FROM OUser "
+ "WHERE name = :name",
{params:{
name: name
}
});
}).then(function(res){
console.log(res);
db.close().then(function(){
console.log('closed');
});
});
First of all, the query is parsed as SELECT * FROM OUser WHERE name = "admin" (observed with the Studio Query Profiler).
As expected, I get the admin user record.
Since the params are evaluated directly as String, there's non need quote them (e.g. :name not ':name'). So there is no way to inject something like ' OR '1'='1 or any ; drop something;
Here are some test I did:
var name='; create class p;';
returns no records;
evaluated by orient as: SELECT * FROM OUser WHERE name = "; create class p;"
var name="' OR '1'='1";
returns no records;
evaluated as: SELECT * FROM OUser WHERE name = "' OR '1'='1"
var name='" OR "1"="1';
returns no records;
evaluated as: SELECT * FROM OUser WHERE name = "\" OR \"1\"=\"1"
quoting the param name in the query: "WHERE name = ':name'"
evaluated as: SELECT * FROM OUser WHERE name = ':name'
Feel free to try more combinations, in my opinion seems quite safe.

Finding all properties for a schema-less vertex class

I have a class Node extends V. I add instances to Node with some set of document type information provided. I want to query the OrientDB database and return some information from Node; to display this in a formatted way I want a list of all possible field names (in my application, there are currently 115 field names, only one of which is a property used as an index)
To do this in pyorient, the only solution I found so far is (client is the name of the database handle):
count = client.query("SELECT COUNT(*) FROM Node")[0].COUNT
node_records = client.query("SELECT FROM Node LIMIT {0}".format(count))
node_key_list = set([])
for node in node_records:
node_key_list |= node.oRecordData.keys()
I figured that much out pretty much through trial and error. It isn't very efficient or elegant. Surely there must be a way to have the database return a list of all possible fields for a class or any other document-type object. Is there a simple way to do this through either pyorient or the SQL commands?
I tried your case with this dataset:
And this is the structure of my class TestClass:
As you can see from my structure only name, surname and timeStamp have been created in schema-full mode, instead nameSchemaLess1 and nameSchemaLess1 have been inserted into the DB in schema-less mode.
After having done that, you could create a Javascript function in OrientDB Studio or Console (as explained here) and subsequently you can recall it from pyOrient by using a SQL command.
The following posted function retrieves all the fields names of the class TestClass without duplicates:
Javascript function:
var g = orient.getGraph();
var fieldsList = [];
var query = g.command("sql", "SELECT FROM TestClass");
for (var x = 0; x < query.length; x++){
var fields = query[x].getRecord().fieldNames();
for (var y = 0; y < fields.length; y++) {
if (fieldsList == false){
fieldsList.push(fields[y]);
} else {
var fieldFound = false;
for (var z = 0; z < fieldsList.length; z++){
if (fields[y] == fieldsList[z]){
fieldFound = true;
break;
}
}
if (fieldFound != true){
fieldsList.push(fields[y]);
}
}
}
}
return fieldsList;
pyOrient code:
import pyorient
db_name = 'TestDatabaseName'
print("Connecting to the server...")
client = pyorient.OrientDB("localhost", 2424)
session_id = client.connect("root", "root")
print("OK - sessionID: ", session_id, "\n")
if client.db_exists(db_name, pyorient.STORAGE_TYPE_PLOCAL):
client.db_open(db_name, "root", "root")
functionCall = client.command("SELECT myFunction() UNWIND myFunction")
for idx, val in enumerate(functionCall):
print("Field name: " + val.myFunction)
client.db_close()
Output:
Connecting to the server...
OK - sessionID: 54
Field name: name
Field name: surname
Field name: timeStamp
Field name: out_testClassEdge
Field name: nameSchemaLess1
Field name: in_testClassEdge
Field name: nameSchemaLess2
As you can see all of the fields names, both schema-full and schema-less, have been retrieved.
Hope it helps
Luca's answer worked. I modified it to fit my tastes/needs. Posting here to increase the amount of OrientDB documentation on Stack Exchange. I took Luca's answer and translated it to groovy. I also added a parameter to select the class to get fields for and removed the UNWIND in the results. Thank you to Luca for helping me learn.
Groovy code for function getFieldList with 1 parameter (class_name):
g = orient.getGraph()
fieldList = [] as Set
ret = g.command("sql", "SELECT FROM " + class_name)
for (record in ret) {
fieldList.addAll(record.getRecord().fieldNames())
}
return fieldList
For the pyorient part, removing the database connection it looks like this:
node_keys = {}
ret = client.command("SELECT getFieldList({0})".format("'Node'"))
node_keys = ret[0].oRecordData['getFieldList']
Special notice to the class name; in the string passed to client.command(), the parameter must be encased in quotes.

Distinct values in Azure Search Suggestions?

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?
The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.
I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}
public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.
You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.

Reconstructing an ODataQueryOptions object and GetInlineCount returning null

In an odata webapi call which returns a PageResult I extract the requestUri from the method parameter, manipulate the filter terms and then construct a new ODataQueryOptions object using the new uri.
(The PageResult methodology is based on this post:
http://www.asp.net/web-api/overview/odata-support-in-aspnet-web-api/supporting-odata-query-options )
Here is the raw inbound uri which includes %24inlinecount=allpages
http://localhost:59459/api/apiOrders/?%24filter=OrderStatusName+eq+'Started'&filterLogic=AND&%24skip=0&%24top=10&%24inlinecount=allpages&_=1376341370337
Everything works fine in terms of the data returned except Request.GetInLineCount returns null.
This 'kills' paging on the client side as the client ui elements don't know the total number of records.
There must be something wrong with how I'm constructing the new ODataQueryOptions object.
Please see my code below. Any help would be appreciated.
I suspect this post may contain some clues https://stackoverflow.com/a/16361875/1433194 but I'm stumped.
public PageResult<OrderVm> Get(ODataQueryOptions<OrderVm> options)
{
var incomingUri = options.Request.RequestUri.AbsoluteUri;
//manipulate the uri here to suit the entity model
//(related to a transformation needed for enumerable type OrderStatusId )
//e.g. the query string may include %24filter=OrderStatusName+eq+'Started'
//I manipulate this to %24filter=OrderStatusId+eq+'Started'
ODataQueryOptions<OrderVm> options2;
var newUri = incomingUri; //pretend it was manipulated as above
//Reconstruct the ODataQueryOptions with the modified Uri
var request = new HttpRequestMessage(HttpMethod.Get, newUri);
//construct a new options object using the new request object
options2 = new ODataQueryOptions<OrderVm>(options.Context, request);
//Extract a queryable from the repository. contents is an IQueryable<Order>
var contents = _unitOfWork.OrderRepository.Get(null, o => o.OrderByDescending(c => c.OrderId), "");
//project it onto the view model to be used in a grid for display purposes
//the following projections etc work fine and do not interfere with GetInlineCount if
//I avoid the step of constructing and using a new options object
var ds = contents.Select(o => new OrderVm
{
OrderId = o.OrderId,
OrderCode = o.OrderCode,
CustomerId = o.CustomerId,
AmountCharged = o.AmountCharged,
CustomerName = o.Customer.FirstName + " " + o.Customer.LastName,
Donation = o.Donation,
OrderDate = o.OrderDate,
OrderStatusId = o.StatusId,
OrderStatusName = ""
});
//note the use of 'options2' here replacing the original 'options'
var settings = new ODataQuerySettings()
{
PageSize = options2.Top != null ? options2.Top.Value : 5
};
//apply the odata transformation
//note the use of 'options2' here replacing the original 'options'
IQueryable results = options2.ApplyTo(ds, settings);
//Update the field containing the string representation of the enum
foreach (OrderVm row in results)
{
row.OrderStatusName = row.OrderStatusId.ToString();
}
//get the total number of records in the result set
//THIS RETURNS NULL WHEN USING the 'options2' object - THIS IS MY PROBLEM
var count = Request.GetInlineCount();
//create the PageResult object
var pr = new PageResult<OrderVm>(
results as IEnumerable<OrderVm>,
Request.GetNextPageLink(),
count
);
return pr;
}
EDIT
So the corrected code should read
//create the PageResult object
var pr = new PageResult<OrderVm>(
results as IEnumerable<OrderVm>,
request.GetNextPageLink(),
request.GetInlineCount();
);
return pr;
EDIT
Avoided the need for a string transformation of the enum in the controller method by applying a Json transformation to the OrderStatusId property (an enum) of the OrderVm class
[JsonConverter(typeof(StringEnumConverter))]
public OrderStatus OrderStatusId { get; set; }
This does away with the foreach loop.
InlineCount would be present only when the client asks for it through the $inlinecount query option.
In your modify uri logic add the query option $inlinecount=allpages if it is not already present.
Also, there is a minor bug in your code. The new ODataQueryOptions you are creating uses a new request where as in the GetInlineCount call, you are using the old Request. They are not the same.
It should be,
var count = request.GetInlineCount(); // use the new request that your created, as that is what you applied the query to.

Resources