Using geospatial commands in rethinkdb with changefeed - geospatial

right now I have a little problem:
I want to use geospatial commands (like getIntersecting) together with the changefeed feature of rethinkdb but I always get:
RqlRuntimeError: Cannot call changes on an eager stream in: r.db("Test").table("Message").getIntersecting(r.circle([-117.220406,32.719464], 10, {unit: 'mi'}), {index: 'loc'})).changes()
the big question is: Can I use getIntersecting with the changes() (couldn't find anything related to that in the docs btw ...) or do I have to abandon the idea of using rethinkdb geospatial features and just use change() to get ALL added or changed documents and do the geospatial stuff outside of rethinkdb?

You can't use .getIntersecting with .changes, but you can write essentially the same query by adding a filter after .changes that checks if the loc is within the circle. While .changes limits what you can write before the .changes, you write basically any query after the .changes and it will work.
r.table('Message')
.changes()
.filter(
r.circle([-117.220406,32.719464], 10, {unit: 'mi'})
.intersects(r.row('new_val')('loc'))
)
Basically, every time there is a change in the table the update will get push to the changefeed, but it will get filtered out. Since there is not a lot of support for geospatial and changfeeds, this is more or less how you would need to integrate the two.
In the future, changefeeds will be much broader and you'll be able to write basically any query with .changes at the end.

Related

What is the best way to run a background process in a Dash app?

I have a Dash application that queries an API, based on a user search query, performs some calculations on the response, then displays the final results to the user on a Dash app. In order to provide a quick response to the user, I am trying to set up a quick result callback and a full result long_callback.
The quick result will grab limited results from the API and display results to the user within 10-15 seconds, while the full results will run in the background, collecting all results (which can take up to 2 minutes), then updates the page with the full results when they are available.
I am curious what the best way to perform this action is, as I have run into forking issues with my current attempt.
My current attempt: Using the diskcache.Cache() as the DiskcacheLongCallbackManager and a database txt file to store availability of results.
I have a database txt file that stores a dictionary, with the keys being the search query and the fields being quick_results: bool, full_results: bool, file_path: str, timestamp: dt (as str).
When a search query is entered and submit is pressed, a callback loads the database file as a variable and then checks the dictionary keys for the presence of this search query.
If it finds the query in the keys of the database, it loads the saved feather file from the provided file_path and returns it to the dash app for generation of the page content.
If it does not find the query in the database keys, it requests limited data from the API, runs calculations, saves the DataFrame as a feather file on disk, then creates an entry in the database with the search query(as the key), the file path of the saved feather file, the current timestamp, and sets the quick_results value to True.
It then loads this feather file from the file_path created and returns it to the dash app for generation of the page content.
A long_callback is triggered at the same time as the above callback, with a 20 second sleep to prevent overlap with the quick search. This callback also loads the database file as a variable and checks if the query is present in the database keys.
If found, it then checks if the full results value is True and if the timestamp is more than 0 days old.
If the full results are unavailable or are more than 0 days old, the long_callback requests full results from the API, performs the same calculations, then updates the already existing search query in the database, making the full_results True and the timestamp the time of completion for the full search.
It then loads the feather file from the file_path and returns it to the dash app for generation of the page content.
If the results are available and less then 1 day old, the long callback simply loads the feather file from the provided file_path and returns it to the dash app for generation of the page content.
The problem I am currently facing is that I am getting a weird forking error on the long callback on only one of the conditions for a full search. I currently have the long_callback setup to perform a full search only if the full results flag is False or the results are more than 0 days old. When the full_results flag is False, the callback runs as expected, updates the database and returns the full results. However, when the results are available but more than 0 days old, the callback hits a forking error and is unable to complete.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
I am at a loss as to why the function would run without error on one of the conditions, but then have a forking error on the other condition. The process that runs after both conditions is exactly the same.
By using print statements, I have noticed that this forking error triggers when the function tries to call the requests.get() function on the API.
If this issue is related to how I have setup the background process functionality, I would greatly appreciate some suggestions or assitance on how to do this properly, where I will not face this forking error.
If there is any information I have left out that will be helpful, please let me know and I will try to provide it.
Thank you for any help you can provide.

Increase column's size in Oracle DB tables with Knex.js

I have a Password's column in a table, stored in OracleDB 11g.
In order to store hashed passwords on it, I need to increment its size from 25 to 60 or 100 BYTE.
I do not want to do this manually, I hope I can find a script or anything else using KnexJS (Something like migrations or seeds)
Thank you.
The correct term for what you want to do is "increase", not "increment". It looks like Knex.js supports changing the default DDL for columns (which is to create) to alter via the alter method. http://knexjs.org/#Schema-alter
In theory, it should work something like this:
knex.schema.alterTable('user', function(t) {
t.string('password', 100).alter();
});
I must admit, the following verbage in this method has me a little concerned:
Alter is not done incrementally over older column type so if you like to add notNull and keep the old default value, the alter statement must contain both .notNull().defaultTo(1).alter().
I'm not sure what that means at the end of the day. Just be sure to test this in development before trying it in production!

First Result Only Using Google-Search-API

I am using abenassi/Google-Search-API https://github.com/abenassi/Google-Search-API to make multiple Google queries in a small python script. I typically only need the first result (link) but the program is built to collect whole pages of results. So far I have been limiting the result as such:
results = google.search(query)
for result in iter(results[0:1]):
loc = result.link
The problem is that the script is slow as a result (I think) of having to wade through the whole page before I get my one link. Does anyone see something obvious I'm missing, or alternately, a simple way to modify the standard_search module https://github.com/abenassi/Google-Search-API/blob/master/google/modules/standard_search.py to limit results to first link only? Thanks!

How to merge nodes and relationships using py2neo v4 and Neo4j

I am trying to perform a basic merge operation to add nonexistent nodes and relationships to my graph by going through a csv file row by row. I'm using py2neo v4, and because there is basically no documentation or examples of how to use py2neo, I can't figure out how to actually get it done. This isn't my real code (it's very complicated to handle many different cases) but its structure is basically like this:
import py2neo as pn
graph = pn.Graph("bolt://localhost:###/", user="neo4j", password="py2neoSux")
matcher = pn.NodeMatcher(graph)
tx = graph.begin()
if (matcher.match("Prefecture", name="foo").first()) == None):
previousNode = pn.Node("Type1", name="fo0", yc=1)
else:
previousNode = matcher.match("Prefecture", name="foo").first())
thisNode = pn.Node("Type2", name="bar", yc=1)
tx.merge(previousNode)
tx.merge(thisNode)
theLink = pn.Relationship(thisNode, "PARTOF", previousNode)
tx.merge(theLink)
tx.commit()
Currently this throws the error
ValueError: Primary label and primary key are required for MERGE operation
the first time it needs to merge a node that it hasn't found (i.e., when creating a node). So then I change the line to:
tx.merge(thisNode,primary_label=list(thisNode.labels)[0], primary_key="name")
Which gives me the error IndexError: list index out of range from somewhere deep in the py2neo source code (....site-packages\py2neo\internal\operations.py", line 168, in merge_subgraph at node = nodes[i]). I tried to figure out what was going wrong there, but I couldn't decipher where the nodes list come from through various connections to other commands.
So, it currently matches and creates a few nodes without problem, but at some point it will match until it needs to create and then fails in trying to create that node (even though it is using the same code and doing the same thing under the same circumstances in a loop). It made it through all 20 rows in my sample once, but usually stops on the row 3-5.
I thought it had something to do with the transactions (see comments), but I get the same problem when I merge directly on the graph. Maybe it has to do with the py2neo merge function finding more identities for nodes than nodes. Maybe there is something wrong with how I specified my primarily label and/or key.
Because this error and code are opaque I have no idea how to move forward.
Anybody have any advice or instructions on merging nodes with py2neo?
Of course I'd like to know how to fix my current problem, but more generally I'd like to learn how to use this package. Examples, instructions, real documentation?
I am having a similar problem and just got done ripping my hair out to figure out what was wrong! SO! What I learned was that at least in my case.. and maybe yours too since we got similar error messages and were doing similar things. The problem lied for me in that I was trying to create a Node with a __primarykey__ field that had a different field name than the others.
PSEUDO EXAMPLE:
# in some for loop or complex code
node = Node("Example", name="Test",something="else")
node.__primarykey__ = "name"
<code merging or otherwise creating the node>
# later on in the loop you might have done something like this cause the field was null
node = Node("Example", something="new")
node.__primarykey__ = "something"
I hope this helps and was clear I'm still recovering from wrapping my head around things. If its not clear let me know and I'll revise.
Good luck.

Using indexed types for ElasticSearch in Titan

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:
After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:
config.setProperty("storage.backend","cassandra")
config.setProperty("storage.hostname","127.0.0.1")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","db/es")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
The second part of the script sets up the indexed types:
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();
The third part loads in the data from the CSV files, this has been tested and works fine.
My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:
g.E.has("property",CONTAINS,"test")
returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.
Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?
Thanks,
Adam
I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):
config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()
alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])
g.commit()
// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()
The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).
Cheers,
Daniel

Resources