Why do the API calls not work in Gremlin Python? - tinkerpop3

In gremlin-python I can do:
for e in g.E().toList():
print(e)
and will get a result like
e[11][4-created->3]
e[12][6-created->3]
e[7][1-knows->2]
e[8][1-knows->4]
e[9][1-created->3]
e[10][4-created->5]
According to
http://tinkerpop.apache.org/javadocs/3.4.3/core/org/apache/tinkerpop/gremlin/structure/Edge.html
an Edge has a inVertex() accessor.
Translating this idea to python leads to:
for e in g.E().toList():
print (e.inVertex().id)
and the error
AttributeError: 'Edge' object has no attribute 'inVertex'
the same holds true for quite a few other "simple" API calls.
for e in g.E().toList():
print(e.property('weight'))
also fails
What is this so and what is the workaround?

In TinkerPop graph elements (e.g. vertices, edges, vertex properties) often go through a process of "detachment". Gremlin traversals that return graph elements from remote sources go through this process and, in these cases, are typically detached to "references". A reference provides just enough information to re-attach to the remote graph. For that process of re-attachment it only needs id and label. Therefore, properties are not returned. It is the same for all languages that Gremlin supports, not just Python (though, I will contradict this statement a bit at the end in a final note).
Speaking specifically for Gremlin Language Variants, like Python, these implementations of Gremlin do not have a full Gremlin Virtual Machine to process traversals and it was never an intent to build full graph structures on the Python side - only graph elements with references to match what would be returned from remote sources. That also reduces the amount of code on the Python side that needs to be maintained because TinkerPop can rely on standard primitives like Dictionary, List etc. that exist in all programming languages.
Technical history aside, the return of references forces uses to write better Gremlin according to best practices. Users should specify exactly what data they want in their Gremlin traversal. Rather than:
g.V().hasLabel('customer')
you would prefer:
g.V().hasLabel('customer').valueMap(true,'name')
or in 3.4.4:
g.V().hasLabel('customer').elementMap('name')
which returns a less nested structure than valueMap(). elementMap() works very nicely for edges and is a replacement for more complex approaches via project() to get the data you're requesting from an edge in your question:
gremlin> g.V().has('person','name','marko').elementMap()
==>[id:1,label:person,name:marko,age:29]
gremlin> g.V().has('person','name','marko').elementMap('name')
==>[id:1,label:person,name:marko]
gremlin> g.V().has('person','name','marko').properties('name').elementMap()
==>[id:0,key:name,value:marko]
gremlin> g.E(11).elementMap()
==>[id:11,label:created,IN:[id:3,label:software],OUT:[id:4,label:person],weight:0.4]
It's really no different in SQL where you likely wouldn't do:
SELECT * FROM customer
but instead:
SELECT name FROM customer
Returning references and forcing users to be a bit more explicit about what they return also solves a massive problem with multi/meta-properties. If a user returns vertices and inadvertently returns a "fat" vertex (e.g. a vertex with 1 million properties on it), it will have a significant impact to the server in trying to return that. By detaching to reference, there is no loophole for users to get stuck in.
All that said, as of 3.4.3, there are points of inconsistency with detachment still and in some cases in Java there are other ways that detachment works beyond reference detachment. TinkerPop has been trying to become completely consistent in this approach but have been trying to do it in a fashion that does not break existing code within existing release lines. This probably isn't the answer you're looking for, but at least it helps explain some of the reasoning and history for why things are as they are.

toList() execute the gremlin query and packs the result in a list.
Thus, you cannot continue the traversal with inVertex().
To get the entering vertices you should run:
for v in g.E().inV().toList():
print(v)
To get the edge properties and both vertices properties in a single query, you can use project:
g.E().project("values", "in", "out")
.by(valueMap(true))
.by(inV().valueMap(true))
.by(outV().valueMap(true))

Looking at the source code at https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/gremlin_python/structure/graph.py (see below) the following properties are directly accessible:
for all elements:
e.id
e.label
for edges:
e.inV
e.outV
The bad news is that properties need first be retrieved so it is not so easy to access ids, labels and properties in a single python statement.
class Element(object):
def __init__(self, id, label):
self.id = id
self.label = label
def __eq__(self, other):
return isinstance(other, self.__class__) and self.id == other.id
def __hash__(self):
return hash(self.id)
class Vertex(Element):
def __init__(self, id, label="vertex"):
Element.__init__(self, id, label)
def __repr__(self):
return "v[" + str(self.id) + "]"
class Edge(Element):
def __init__(self, id, outV, label, inV):
Element.__init__(self, id, label)
self.outV = outV
self.inV = inV
def __repr__(self):
return "e[" + str(self.id) + "][" + str(self.outV.id) + "-" + self.label + "->" + str(self.inV.id) + "]"

Related

How to retrieve nested output from XCom using taskflow syntax in Airflow

Well, I know this seems to be possible I just don't know how. To begin with, I am using traditional operators (without #task decorator) but I am interested in XComArgs return output format from these operators that can be used in downstream tasks. Below is a sample example
task_1 = DummyOperator(
task_id = 'task_1'
) # returns {"data": {"foo" : [{"cmd": "ls"}]}}
task_2 = BashOperator(
task_2='task_2',
cmd=task_1.output['return_value']['data']['foo'][0]['cmd'] # does not give what I need and returns null.
#cmd = f"{{ ti.xcom_pull(task_ids = 'task_1', key='return_value')['data']['foo'][0]['cmd'] }}" Gives what I need
)
In this example what is working for me which is pure Jinja templating and the new syntax does not work for me using XComArgs. I have tried changing the argument render_template_as_native_obj=True in Dag configuration but does not change anything. I want to use .output format which returns XcomArgs object and is returning the complete dict but have not been able to use the nested keys like above. Also, have tried converting string to JSON and all those combinations but does not seem to work.
Unfortunately, retrieving nested values from XComArgs in a limitation of the TaskFlow API.
The TaskFlow API uses __getitem__ to override the XCom key to use. In your example, the key ends up being "cmd" rather than the value of what cmd represents in that nested object. You'll have to use the original ti.xcom_pull() method until that limitation is addressed.

django remove m2m instance when there are no more relations

In case we had the model:
class Publication(models.Model):
title = models.CharField(max_length=30)
class Article(models.Model):
publications = models.ManyToManyField(Publication)
According to: https://docs.djangoproject.com/en/4.0/topics/db/examples/many_to_many/, to create an object we must have both objects saved before we can create the relation:
p1 = Publication(title='The Python Journal')
p1.save()
a1 = Article(headline='Django lets you build web apps easily')
a1.save()
a1.publications.add(p1)
Now, if we called delete in either of those objects the object would be removed from the DB along with the relation between both objects. Up until this point I understand.
But is there any way of doing that, if an Article is removed, then, all the Publications that are not related to any Article will be deleted from the DB too? Or the only way to achieve that is to query first all the Articles and then iterate through them like:
to_delete = []
qset = a1.publications.all()
for publication in qset:
if publication.article_set.count() == 1:
to_delete(publication.id)
a1.delete()
Publications.filter(id__in=to_delete).delete()
But this has lots of problems, specially a concurrency one, since it might be that a publication gets used by another article between the call to .count() and publication.delete().
Is there any way of doing this automatically, like doing a "conditional" on_delete=models.CASCADE when creating the model or something?
Thanks!
I tried with #Ersain answer:
a1.publications.annotate(article_count=Count('article_set')).filter(article_count=1).delete()
Couldn't make it work. First of all, I couldn't find the article_set variable in the relationship.
django.core.exceptions.FieldError: Cannot resolve keyword 'article_set' into field. Choices are: article, id, title
And then, running the count filter on the QuerySet after filtering by article returned ALL the tags from the article, instead of just the ones with article_count=1. So finally this is the code that I managed to make it work with:
Publication.objects.annotate(article_count=Count('article')).filter(article_count=1).filter(article=a1).delete()
Definetly I'm not an expert, not sure if this is the best approach nor if it is really time expensive, so I'm open to suggestions. But as of now it's the only solution I found to perform this operation atomically.
You can remove the related objects using this query:
a1.publications.annotate(article_count=Count('article_set')).filter(article_count=1).delete()
annotate creates a temporary field for the queryset (alias field) which aggregates a number of related Article objects for each instance in the queryset of Publication objects, using Count function. Count is a built-in aggregation function in any SQL, which returns the number of rows from a query (a number of related instances in this case). Then, we filter out those results where article_count equals 1 and remove them.

how to do Gremlin contain search for both number and string

Neptune 1.0.2.1 + Gremlin + nodejs.
I have a vertext and property, e.g. Vertex - Device, property - Test, the Test property could store different type of data, e.g. number and string
Vertex 1 - Test = ['ABCD','xyz']
Vertex 2 - Test = [123,'XYZ']
I want to do a 'containing' search, e.g. Test=A, or Test=123 regardless the datatype.
I was trying
queryText = 'BC' //this throw error
or queryText = 123 //this actually works
//I expect both case should hit the result.
g.V().hasLabel('Device').or(__.has('Test', parseFloat(queryText)), __.has('Test', textP.containing(queryText)));
but get 'InternalFailureException\' error
Is it possible I can write a single query regardless the datatype?
if not possible, or at least make textP.containing work with multiple query assuming I know the datatype? right now the containing search throw error if the property contains number
It looks like you have the closing bracket in the wrong place inside the or() step. You need to close the first has step before the comma.
In your example
g.V().hasLabel('Device').or(__.has('Test', parseFloat(queryText), __.has('Test', textP.containing(queryText))));
Which should be
g.V().hasLabel('Device').or(__.has('Test', parseFloat(queryText)), __.has('Test', textP.containing(queryText)));
EDITED and UPDATED
With the corrected query and additional clarification about the data model containing different types for the same property key, I was able to reproduce what you are seeing. However, the same behavior can be seen using TinkerGraph as well as Neptune. The error message generated is is a little different but the meaning is the same. Given the fact that TinkerGraph behaves the same way I am of the opinion that Neptune is behaving consistently with the "reference" implementation. That said, this raises a question as to whether the TextP predicates should be smarter and check the type of the property before attempting the test.
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('test').property('x',12.5)
==>v[0]
gremlin> g.addV('test').property('x','ABCDEF')
==>v[2]
gremlin> g.V().hasLabel('test').or(has('x',12.3),has('x',TextP.containing('CDE')))
java.math.BigDecimal cannot be cast to java.lang.String
Type ':help' or ':h' for help.
Display stack trace? [yN]
ADDITIONAL UPDATE
I created a Jira issue so the Apache TinkerPop community can consider making a change to the TextP predicates.
https://issues.apache.org/jira/browse/TINKERPOP-2375

Google Cloud Datastore Cursor with google.cloud.ndb

I am working with Google Cloud Datastore using the latest google.cloud.ndb library
I am trying to implement pagination use Cursor using the following code.
The same is not fetching the data correctly.
[1] To Fetch Data:
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5)
This code works fine and fetches 5 entities from MyModel
I want to implementation pagination that can be integrated with a Web frontend
[2] To Fetch Next Set of Data
from google.cloud.ndb._datastore_query import Cursor
nextpage_value = "2"
nextcursor = Cursor(cursor=nextpage_value.encode()) # Converts to bytes
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor= nextcursor)
[3] To Fetch Previous Set of Data
previouspage_value = "1"
prevcursor = Cursor(cursor=previouspage_value.encode())
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor=prevcursor)
The [2] & [3] sets of code do not fetch paginated data, but returns results same as results of codebase [1].
Please note I'm working with Python 3 and using the
latest "google.cloud.ndb" Client library to interact with Datastore
I have referred to the following link https://github.com/googleapis/python-ndb
I am new to Google Cloud, and appreciate all the help I can get.
Firstly, it seems to me like you are expecting to use the wrong kind of pagination. You are trying to use numeric values, whereas the datastore cursor is providing cursor-based pagination.
Instead of passing in byte-encoded integer values (like 1 or 2), the datastore is expecting tokens that look similar to this: 'CjsSNWoIb3Z5LXRlc3RyKQsSBFVzZXIYgICAgICAgAoMCxIIQ3ljbGVEYXkiCjIwMjAtMTAtMTYMGAAgAA=='
Such a cursor you can obtain from the first call to the fetch_page() method, which returns a tuple:
(results, cursor, more) where results is a list of query results, cursor is a cursor pointing just after the last result returned, and more indicates whether there are (likely) more results after that
Secondly, you should be using fetch_page() instead of fetch_page_async(), since the second method does not return you the cursors you need for pagination. Internally, fetch_page() is calling fetch_page_async() to get your query results.
Thirdly and lastly, I am not entirely sure whether the "previous page" use-case is doable using the datastore-provided pagination. It may be that you need to implement that yourself manually, by storing some of the cursors.
I hope that helps and good luck!

Combining new ArangoSearch views and graph traversals

I've read through the ArangoDB 3.4 docs and the ArangoSearch view tutorial, but I'm still unclear on if/how views can be combined with graph traversals. There is an example of a graph/view join in the tutorial; however, what I need to do is to simply filter the candidate pool resulting from a traversal with a view-based text search. For example:
"for i in 2..2 outbound start_doc edges1, inbound edges2 [filter by view] return i"
The initial 2-hop traversal from the "start_doc" vertex will result in a much smaller candidate pool than the entire collection. I want to then perform a text search on this candidate pool using a configured view (probably "text_en" analyzer).
Would i just define the view expression after the traversal? Or would I need to use a "union_distinct" function to combine the traversal and the search results? (This seem like it would be very inefficient given a potentially very large result set from the view.)
Thanks!
This is how I solved a similar problem, perhaps it will work for you too:
for i in 2..2 outbound start_doc edges1, inbound edges2
filter (
for x in view
search i._key == x._key and search_condition
limit 1
return x
) != []
return i

Resources