hierarchical fulltext search in neo4j - search

I have parent/child graph in neo4j: decision(parent, list of child decision) which have property name (string) I'm going to use for search. it perfectly find my decisions which have search term in name by query:
START d=node:node_auto_index({autoIndexQuery}) MATCH (d:Decision) RETURN d
I want to complicate this query to find decision which have search term in name AND ALSO have search term in names of its children:
Name of relation is CONTAINS (Decision CONTAINS decisions)

I think the following query should work:
START parent=node:node_auto_index({autoIndexQuery})
WITH parent
START child=node:node_auto_index({autoIndexQuery})
MATCH (parent:Decision)-[:contains]->(child:Decision)
where parent <> child
RETURN parent, child;
One issue here is that a full text query condition (I think) can only take place in a START block. This means you'd need to match both parent and child that way, then connect them with MATCH.
This might take some time to complete, depending on how many nodes you have matching, since the query will mostly see if this parent/child relationship exists between all of the nodes that match. But it should get the job done.

Related

django remove m2m instance when there are no more relations

In case we had the model:
class Publication(models.Model):
title = models.CharField(max_length=30)
class Article(models.Model):
publications = models.ManyToManyField(Publication)
According to: https://docs.djangoproject.com/en/4.0/topics/db/examples/many_to_many/, to create an object we must have both objects saved before we can create the relation:
p1 = Publication(title='The Python Journal')
p1.save()
a1 = Article(headline='Django lets you build web apps easily')
a1.save()
a1.publications.add(p1)
Now, if we called delete in either of those objects the object would be removed from the DB along with the relation between both objects. Up until this point I understand.
But is there any way of doing that, if an Article is removed, then, all the Publications that are not related to any Article will be deleted from the DB too? Or the only way to achieve that is to query first all the Articles and then iterate through them like:
to_delete = []
qset = a1.publications.all()
for publication in qset:
if publication.article_set.count() == 1:
to_delete(publication.id)
a1.delete()
Publications.filter(id__in=to_delete).delete()
But this has lots of problems, specially a concurrency one, since it might be that a publication gets used by another article between the call to .count() and publication.delete().
Is there any way of doing this automatically, like doing a "conditional" on_delete=models.CASCADE when creating the model or something?
Thanks!
I tried with #Ersain answer:
a1.publications.annotate(article_count=Count('article_set')).filter(article_count=1).delete()
Couldn't make it work. First of all, I couldn't find the article_set variable in the relationship.
django.core.exceptions.FieldError: Cannot resolve keyword 'article_set' into field. Choices are: article, id, title
And then, running the count filter on the QuerySet after filtering by article returned ALL the tags from the article, instead of just the ones with article_count=1. So finally this is the code that I managed to make it work with:
Publication.objects.annotate(article_count=Count('article')).filter(article_count=1).filter(article=a1).delete()
Definetly I'm not an expert, not sure if this is the best approach nor if it is really time expensive, so I'm open to suggestions. But as of now it's the only solution I found to perform this operation atomically.
You can remove the related objects using this query:
a1.publications.annotate(article_count=Count('article_set')).filter(article_count=1).delete()
annotate creates a temporary field for the queryset (alias field) which aggregates a number of related Article objects for each instance in the queryset of Publication objects, using Count function. Count is a built-in aggregation function in any SQL, which returns the number of rows from a query (a number of related instances in this case). Then, we filter out those results where article_count equals 1 and remove them.

AEM Query builder exclude a folder in search

I need to create a query where the params are like:
queryParams.put("path", "/content/myFolder");
queryParams.put("1_property", "myProperty");
queryParams.put("1_property.operation", "exists");
queryParams.put("p.limit", "-1");
But, I need to exclude a certain path inside this blanket folder , say: "/content/myFolder/wrongFolder" and search in all other folders (whose number keeps on varying)
Is there a way to do so ? I didn't find it exactly online.
I also tried the unequals operation as the parent path is being saved in a JCR property, but still no luck. I actually need unlike to avoid all occurrences of the path. But there is no such thing:
path=/main/path/to/search/in
group.1_property=cq:parentPath
group.1_property.operation=unequals
group.1_property.value=/path/to/be/avoided
group.2_property=myProperty
group.2_property.operation=exists
group.p.or=true
p.limit=-1
This is an old question but the reason you got more results later lies in the way in which you have constructed your query. The correct way to write a query like this would be something like:
path=/main/path/where
property=myProperty
property.operation=exists
property.value=true
group.p.or=true
group.p.not=true
group.1_path=/main/path/where/first/you/donot/want/to/search
group.2_path=/main/path/where/second/you/donot/want/to/search
p.limit=-1
A couple of notes: your group.p.or in your last comment would have applied to all of your groups because they weren't delineated by a group number. If you want an OR to be applied to a specific group (but not all groups), you would use:
path=/main/path/where
group.1_property=myProperty
group.1_property.operation=exists
group.1_property.value=true
2_group.p.or=true
2_group.p.not=true
2_group.3_path=/main/path/where/first/you/donot/want/to/search
2_group.4_path=/main/path/where/second/you/donot/want/to/search
Also, the numbers themselves don't matter - they don't have to be sequential, as long as property predicate numbers aren't reused, which will cause an exception to be thrown when the QB tries to parse it. But for readability and general convention, they're usually presented that way.
I presume that your example was just thrown together for this question, but obviously your "do not search" paths would have to be children of the main path you want to search or including them in the query would be superfluous, the query would not be searching them anyway otherwise.
AEM Query Builder Documentation for 6.3
Hope this helps someone in the future.
Using QueryBuilder you can execute:
map.put("group.p.not",true)
map.put("group.1_path","/first/path/where/you/donot/want/to/search")
map.put("group.2_path","/second/path/where/you/donot/want/to/search")
Also I've checked PredicateGroup's class API and they provide a setNegated method. I've never used it myself, but I think you can negate a group and combine it into a common predicate with the path you are searching on like:
final PredicateGroup doNotSearchGroup = new PredicateGroup();
doNotSearchGroup.setNegated(true);
doNotSearchGroup.add(new Predicate("path").set("path", "/path/where/you/donot/want/to/search"));
final PredicateGroup combinedPredicate = new PredicateGroup();
combinedPredicate.add(new Predicate("path").set("path", "/path/where/you/want/to/search"));
combinedPredicate.add(doNotSearchGroup);
final Query query = queryBuilder.createQuery(combinedPredicate);
Here is the query to specify operator on given specific group id.
path=/content/course/
type=cq:Page
p.limit=-1
1_property=jcr:content/event
group.1_group.1_group.daterange.lowerBound=2019-12-26T13:39:19.358Z
group.1_group.1_group.daterange.property=jcr:content/xyz
group.1_group.2_group.daterange.upperBound=2019-12-26T13:39:19.358Z
group.1_group.2_group.daterange.property=jcr:content/abc
group.1_group.3_group.relativedaterange.property=jcr:content/courseStartDate
group.1_group.3_group.relativedaterange.lowerBound=0
group.1_group.2_group.p.not=true
group.1_group.1_group.p.not=true

Search in hierarchical tree on neo4j

I have a database organized in several hierarchical trees.
Nodes are organized by number.Nodes that begin with the same number are interconnected by relationships. For example: (5)-[connect]-(50)-[connect]-(507)... etc. I want to search, for example, the node 301 starting from the first parent node: the node 3. How do I do this query in cypher?
If you want to search for a specific node starting from the first parent I would suggest following query:
MATCH (n {number:1})-[:CONNECT*0..]->(n1) return n, n1;
This query searches for the node with property number = 1 and searches for all children which are related through CONNECT relationship. If you want to search for a specific child node you have to change the query this way:
MATCH (n {number:1})-[:CONNECT*0..]->(n1 {number:101}) return n, n1;
In the *0.. part you can define until what depth you want to search, so you can also search for depth=n with *0..n. This documentation is a good place to start with the match/path clause: https://neo4j.com/docs/developer-manual/current/cypher/clauses/match/

Search Documents from two collections in MarkLogic

In Marklogic, I want to search between two collections by joining the id element of doc from collection1 to id element of doc from collection2. When it is matched i need the resulting document from both collections.
I have the below code, but it is very slow. How to use cts:search or search:search to achieve the same
for $i in collection('demographic')/individual,
$j in collection('membership')/membership[enrolleIndividualId/id/text() = $i/individual/id/text()])
return {$i,$j}
Update:
I should note that your sample is not valid XQuery: return element root { $i, $j } would be valid. Also, you should not use the /text() node selector, as it's behavior can be counterintuitive. You can compare elements directly in an XPath predicate ([enrolleIndividualId/id eq $i/individual/id]). Use /fn:string() in place of /text() if you need the contents of an element as a string. I'd also recommend using the atomic equality operator eq in place of the sequence equality operator = when directly comparing individual elements.
Original Answer:
There are several approaches to implementing joins in MarkLogic, but I would first question your data model. From the names of the elements in your sample query, it looks like you are using a relational model (individuals have memberships). MarkLogic is a document database, and it's optimized for denormalized documents. You will be much better served to process your data and generate new individual documents that each contain the relevant membership data.
That being said, here's how you could join your documents:
First, you will need range indices to write performant joins. If the id element from your sample query is not unique to individuals, you will need path range indices on enrolledIndividualId/id and individual/id, otherwise, a simple element range index on id will do.
The most common join pattern in MarkLogic uses a "shotgun-OR" query; first retrieving values from the lexicon backing a range index, and then constructing an or-query from those values to retrieve the relevant documents. This won't work directly in your case, as you want to retrieve both sides of the join. You can either run a search for each pair of documents, or run a single search for one side, and then an additional document read for each document.
pairs:
for $value in cts:values(cts:path-reference("individual/id"))
return
cts:search(/,
cts:or-query((
cts:and-query((
cts:collection-query("demographic"),
cts:path-range-query("individual/id", "=", $value))),
cts:and-query((
cts:collection-query("membership"),
cts:path-range-query("enrolledIndividualId/id", "=", $value))))),
"unfiltered")
shotgun-OR plus iteration:
for $doc in
cts:search(/,
cts:and-query((
cts:collection-query("demographic"),
cts:path-range-query("individual/id", "=",
cts:values(cts:path-reference("individual/id"))))),
"unfiltered")
return
cts:search(/,
cts:and-query((
cts:collection-query("membership"),
cts:path-range-query("enrolledIndividualId/id", "=", $doc/individual/id))),
"unfiltered")
As you can see, each approach requires I/O proportionate to the number of docs/values you want to join. If you only needed the shotgun-OR (ie, a query for documents based on criteria from other documents), you would only need to make two requests, the initial cts:values() call to retrieve values from a lexicon, and the cts:search() call using a query built from those values.
Note: the cts:query objects used in these examples could be used in conjunction with the Search API by means of the search:resolve() function.
Given your apparent data model, you will be much better served by processing your data into individual, de-normalized documents.

Neo4j-php retrieve node

I have been exclusively using cypher queries of this client for Neo4j because there is no out of the box way of doing many things. One of those id to get nodes. There is no way to retrieve them without knowing their id, which is very low level. Any idea on how to run a
$client->findOne('property','value');
?
It should be straightforward but it isn't from the documentation.
Make Indexes on the properties you want to search, from a newly created $personNode
$personIndex = new \Everyman\Neo4j\NodeIndex($client, 'person');
$personIndex->add($personNode, 'name', $personNode->name);
Then later to search, the new PHP object $personIndex will reference the same, populated index as above.
$personIndex = new \Everyman\Neo4j\NodeIndex($client, 'person');
$match = $personIndex->findOne('name', 'edoceo');

Resources